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ABSTRACT 



A standardiEed method has been developed which will convert prose 
training materials into a form which forces the trainees to read the 
material with at least a minimal level of con^rehensioni Tne materials, 
called programmed prose materials^ are developed in an objective manner 
amenable to computer production. Phase I of the project involved an 
extensive investigation of a new technique, called the reading-storage 
test, for measuring the learning that occurs during reading so that the 
effectiveness of programmed prose could be properly assessed in Phases 11^. 
IIIj 8 IV. This technical report covers the Phase I and Phase 11 research. 
In six Phase I experiments, the reading-storage test was compared to two 
other types of tests. TTie results suggested that the completely objective, 
reading-storage test provides a better measure of the primary effects of 
reading than its two closest competitors j i.e*, the cloze test which is 
developed objectively but scored subjectively, and the paraphrase test 
which is developed Eubjectively but may be scored objectively. In the 
Phase II experiment, programmed prose was compared to regular prose under 
low and high motivation conditions* The programmed proje = facilitated 
leaining under the low motivation condition, and inhibited learning under 
the high motivation condition, as had been hypothasized. It was coneluded 
that: (a) the objectively developed reading-storage test is valid as a 
measure of the learning, understanding, comprehending, or information storing 
that occurs during reading, and (b) programmed prose facilitates learning 
in reading situations wherain attenti^dn wanes. 
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Introduction 

TTiis; restarch project is primarily concerned with investigating the 
affectivintsi of a method for increasing leaCTiing during reading, Another 
important purpose is to invastigate the validity of a new method for measuring 
learning during reading. TTiis introduction section will pTesent an overview 
of the entire project as well as an overview of this technical report* 

Previous research results have supported the theory which contends 
cartain situations where low amounts of learning are expected^ the amount 
learned can be increased by forcing the learner to interact or behavi orally 
reipond to the stimulus materials, A new technique which forces an interaction 
with prose materials has been proposed. It is unique in that it can be object- 
ively developed, a computer can be programmid to take prose as input and 
produce the training material as output. The goal of this research project is 
to determlna the conditions under which these materials, called programmed 
prose materials, facilitate learning. 

In order to Investigate the effectiveness of the programmed prose technique* 
an appropriate measure of learning was needed. Previous measurement techniques, 
such as multiple choice questions, have usually involved subjective judgments 
on the part of the test developer. Subjectively developed criteria have two 
serious disadvantages i (a) they are costly to produce since someone must be 
paid a high rate for low production rates, and (b) the repllcability of the 
riiulti for different people and different materials is always questionable, A 
new mithod has also been developed for automatically producing standardized 
objective teits for amount leamed from prose materials* This type of tests 
called a raading-storagi test, has the advmtago of being programmable for 
a computer* Given prose as input, a programmid computer can output a test for 
the prose material, 

TTie entire project is to be eonductid in feur phases* Phasoi I and II ware 
acconqplished during this past year and Phases III and IV art to be aecQmpliihtd 
during the next year. In Phase I, the validity of the riadlng-storagi technique 
was evaluatedi In Phase II, the effoctiviness of programmed prose was evaluatid 
in both low and high motivation coiidltloni. Phase III will assess the effective- 
ness of programntd proi© of vaiying levels of material difficulty. Phase IV 
will investigate the effectiviness of thi method in i situation where it would 
be expected to be maximal, i,e,, inner city high school youths rtading Navy 
training matirial, 
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This technical report will contain the research results relevant to 
Phase I and Phase 11^ i.e,, the first year of a two»year project, Befort 
presenting the details of the research, two background rationale sections will 
be presented. The first section will provide the background for investigating 
the effectiveness of programmed prose, and the second will provide the back- 
ground for investigating the validity of the reading-storage test. 



Backgroiind for Programmed Prose 



In idlli Thomdike stated that the vice of the poor reader Is to say the 
words to himialf without actively making judgments concerning what they reveal. 
Thus J Thorndika seems to be one of the first to recognise the importance of 
getting the learner actively involved in the process of reading in order to 
facilitate learning* 

More recerttly, this area of concern has been researchid by Rothkopf (19653* 
Rothkopf was interested in undarstanding the role of student responses in pro^ 
grammed instruction and this prompted him to investigate specially developed 
adjunct questions as facilitators of learning from prose materials, Rothkopf 
(1970) has stated that some response on the learner's part is necessary to 
transform a nominal stiniulus into m effective stimulus.j and he has described 
three classes of activity involved in effect ing this transformation* Class I, 
Orlentationj involves getting Ss into the vicinity of instructional objects 
and keeping them there for suitable periods, Clasi 11, Object Acquliitlon, 
involves the selection and procurement of appropriate instructional 
objecti, Clasi 111, Translation and Processingj involvei syitamatic eye fixations 
translation into speech, discrimination and procissing. Much of Rothkopf's 
research has focused upon the effectiveness of questions and their platemint 
as facilitators of learning from prose* TTie use of questions does not seem 
to be highly effective in this regard, at least when compared to an admonition 
to ''read carefully and slowly'' (see Carver, 1972a3* However, Rothkopf ^s 
research does represent a concerted effort to thorpughly investigate a practical 
technique for facilitating learning during reading by attempting to force the 
learner to interact with prose materials; 

Another investigator who has been concerned with the activities that the 
student engages in when confronted with instructional materials is Anderson (1970) 
The following excei^t from Anderson (1970) summarizes the nature of the problem 
from an instructional point of viewt 

One cannot be sure what a student Is doing when he is looking at the 
pages of a textbook. He may be reading avaiy line or he may be 
iklmlng the pagt. He may test himiilf en the implication of 
what he reads ^ but he may not. Ha may give selective emphasis 
to certain sectioni as students seem to do when they underline 
portions of a text. TTia student's emphaiii li not necaisarily the 
emphails that the teacher deiiras. The student may spend more time 
on sections that he has trouble undirstanding, or he may skip 
difficult sections* If th© student gats bored or tired he may begin 
to daydream or even go to sleep [p. 349]. 



Anderson points out that^ traditionally, the word "attention'- has been 
used to designate the process whereby learners translate noniinsl stlrnuli into 
effective stimuli* Ha contends also that the control of attention is probably 
most critical when the learners are bored^ tired, influenced to work hurriedly^ 
or given difficult material, Anderson suggests the following series of mediating 
processes necessaa^ for learning" Ca) noticing the stimulus, (b) translating 
it into intemal speech, (c) evoking images for the things and events named by 
the words, and (d) conceiving relationships ainong the imagined things or events. 
He contends that the chief problem for educational engineering is to discover 
how to alter the characteristics of instructiortal tasks so as to force students 
to do all of the processing required for learning [p, 363]," But, Anderson also 
acknowledges that some procedures which force attention may in fact Inhibit the 
complete processing necessary for understanding and learning* 

Two instructional techniques have receivid a great deal of TeseaTCh 
attention as procedures which attenpt to increase learning by forcing an inter- 
action bitween the learner and prose passages. TTie research on the question- 
technique by Rothkopf has already been noted. TTie other technique is the cloze- 
technique, i.e,, the deletion of certain words in some regular manner from a 
passage and substituting imderlined blank spaces. Cloze was first recommended 
by Taylor (1953) as a method of measuring readability, but it has also been re- 
searched as a taaching technique C^^^ annotated review by Jongsma, 1971) , 

Both the cloze-technique and the question ^technique have advantages and 
disadvantages which are inherent to the approach* i.e., advantages and dis- 
advantages not associated with the results they produce. The cloze-technique 
has the advantage that Instructional materials can be developed from ordinary 
prose passages In a conpletely objective manner. The cloze-technique can be 
applied to any prose passage by any person (or machine) with the same general 
riiulti axpected no matter who or what developed the materials. Conversely, 
initructlonal mattriali must be subjectively developid from prose materials 
under the question-tachnique, and this li a distinct disadvantage assoclatid 
with this technique. TTie queition-tichnlqui must be applied to prose materials 
using the best iubjectivs Judgments of the human producirs with no standards or 
guidelines regarding how to write the queitloni, the type of questions to use, 
the number of questions ^ or the location of the questions, This lack of object- 
ivity of the question-teehnique not only makes it difficult to generalize 
rf search riiulti but it also reduces its prictlcal usefulntsi. An ©Xpert must 
be employed to produce the questions* 
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From the standpoint of objectivity in development , the cloze- tichniqut 
has a distinct advantage over the question-technique. However, thert are two 
primary disadvantages inherent to the cloze=technique which are not inherent to 
the question-technique. The cloze technique iiivolves a mutilation of the 
learning materials, i,e,, the prose, and this degradation of the original in- 
formation should detract from learning. Sqmo of the information is missing and 
all of the learners are not likely to be able to correctly fill in all of the 
missing parts and thereby reconstruct the original message. Thus, the cloie- 
tiehnlque may be expected to increase learning by forcing increased attention, 
but at least part of this gain is probably lost due to the fact that not all of 
the original information is pressntsd. ITiis is not a problem with the question- 
tachniqui as the prose passage is always presented in its entirety; 
questions art added but nothing is taken away, ThQ other primary disadvantage^ 
of the clo2e-technique is that it reprisents a task which is quite different 
from ordinary reading, to individual must become a problem solver for each 
missing word by trying out several candidate alternatives and then choosing the 
one that seems best. Thenj the learner must disengage himself from the primary 
learning task and write an answer in the blank space provided. This makes the 
task highly inefficient as compared to ordinary learning by reading* Since the 
multiple-choice task does not present this problem, it seems inherently better 
in this regard as compared to the cloge-technique* 

It appears that both the question-technique and the cloze= technique contains 
serious inherent disadvantages. Recently, a method, called reading- input, has 
been advanced which is similar to clOEe but seems to encompass its advantages 
while overcoming its disadvantages (Carver, 1971a3, The technique has been used 
as a test (see Carver, 1970) and as a method for estimating material difficulty 
(see Carver, 1973d), but it also seemed useful as a technique to manipulate 
attention during learning. An example of the reading=input technique is presented 
in Fig* 1. Notice that it is similar to the cloze-technique except instead of 
deleting words and requiring fill-in.s,an incorrgct word is added as an alternative 
to the correct word. 

The reading=input technique overcomes the two prlmaiy disadvantages of the 
cloze technique, TTiere is very little interruption of the normal reading act 
since the corriet answer is uiually readily recognized and there is no necessity 
to write out a word since the simple checking of a box is all that is required* 
The reading-input technique alio preierves the objectivity advantage of the cloza- 
technique* there ii no iubjactivity involved in the seliction of the alternative 
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wrong words. The alteinative wrong words are selected from the surrounding 
context according to an algorithm that is amenable to computer production 
(sea Carver^ 1971b) . 
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Fig, 1, kn example of thi rsading-input technique applied to a segment 
of a passage, 

TTie reading-input techniqua is vety similar to prograinmed instruction when 
it is used as a method for manipulating attention by forcing the learner to 
Interact or behaviorally respond to the prose, TTiuSj the reading-input technique 
has been termed^ programmed prose, when it is used for this purpose ^ and the 
reading-input materials are called programmed prose materials. 

There is nothing inherent in progranuned prose materials which will allow 
learning to be directly manipulated, An individual who correctly chooses the 
altirnatives in progranuiied prose materials wilT not necessarily have, understood^ 
comprihandid or stored the information contalnid in the materials. An analogy 
can be madi to the old saying that you can lead a horse to water but you cannot 
make him drink, Progranuned proie materials cannot be expictad to make the horse 
drink (the student learn) but they can be expicted to make him put his mouth in 
the water (interact with the learning material), ^d^ you can be reasonably sure 
that the horse (the learner) will never ingest any water (infonnation) if he 
never putP hii mouth into the water (if he never interacts). 



Progranmied prose would seem to be an indirect facilitator of learnAng in 
those situations where attention was not expected to be optimal. If the 
learner was sufficiently motivated, if the materials were not too difficult, 
and if the time spent learning was not too extensive^ then programnied prose 
would probably be an inefficient way to learn* For a college student who has 
high ability and who is motivated to learn in his introductory psychology class, 
th© conversion .of the regular textbook prose into programmed prose is likely to 
inhibit the amount he learns per a fixed interval of time* TTiis is becauie the 
lack of attention is not a problem* Programmed prose would force this indi- 
vidual to do things while reading that would interrupt his normally efficient 
learning activities. On the other hand, for students who are relatively low in 
ability and who are not very interested in psychology, programmed prose may 
force them to attend to the learning material, 'HierefQre, the student may learn 
much more efficiently than otherwise would be the case. Programmed prose could 
be expected to facilitate learning in those situations where attention wanes ^ 
but it should inhibit learning in those situations where attention is continuously 
high. 

TTie major purpose of this research project is to investigate the efficacy 
of the programmed prose technique In situations where it is i and is not expected 
to facilitate leaming. 
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Background for the Rtading=Storage Test 



Richard Anderson (1972) has contended that procedures for constructing 
and diicribing comprehansion tests are a ••mess" becausi it Is impossible to 
know what the tests measure. He recoimnends that drastic action be taken. 
The present research on a reading-storage measure represents an attempt to 
solve che long recognized problem that Anderson has articulated so well. How 
' do we measure the primaiy or beneficial effects of reading? What empirical 

measure caji we use that will provide strong evidence that the individual Who 
is given something to read has in fact CGi^prehendedj undirstood, or stored the 
informEtion that he has supposedly read? taderson argufs extanslvely for the 
use of paraphraB© questions » He contends that an individual can answer other 
types of questions, e^g,, varbatlni, using orthographic or phonological encoding 
without any of the comprehension that is associated with semantic encoding. 

Before performing a critical analysis of the various available techniques 
for measuring the primary effects of readings some background is needed for 
understanding the processes involved in reading* Firsts it is necessaiy to dis- 
criminate among the possible processes that an individual may engage in when 
prose passage in presented, The individual may scan the pas sage to find 
particular wc rd. TTie individual may rapidly skim or skip over the passage to 
get iome idea of what the passage is about. The individual may try to memorise 
the passaga word for word, TTie above activities and puiposes are not exhaustive 

' ■ but are illustrative of the many different ways of processing the information con^ 

tained in prose materials. None of the above ways are of direct interest at 
present. What is of interest is the type of activities normally engaged in when a 
person is said to be reading, TTiese activities are conslderid to be a communication 
process wherein the thoughts of the originator of the communication Ci,e,| the 
author) are being underst00d| comprehended, or stored by the reader of the communi- 
cation (see Skinner, 1957, p,278) , Carver C1971a) has contended that the "under- 
standing" function is automatically a type of storage function in that under- 
standing means that the thought has been successfully related to previously stored 
thoughts, knowledge, or information, This "relating to" function Is a type of 
coding function that is similar to what Pribram (1968) talks about when he dis cusses 
the basic coding of experitncii i.e*! Imagingi That Isi most readini is an ex- 
perlential process which stands as a surrogate for the following i (a) actually 

i . visiting the island that the passage describes, (b) actually participating in 



the happenings associated with the characters of a novel, or (c) actually 
witnessing the svents described by a newspaper reporteT. TTiis -'relating to'* 
function is also similar to what Robinson (1960) called the fusion of ideas 
read with previous experience, and what Newell and Simon (1967) referred to as 
a chunk fixated in long-tirm memory or "familiarization." In a recent simu- 
, lation of human long-term memoiy, Fridja (1972) explained the process this way: 

The same interaction between input and stored. information may be 
usad to integrate new information into the network. By linking it 
up to an appropriate place, available implications may become 
accessible and, therefore, the corresponding variety of acetis-ways 
may permit subsequent recall. Locating new information in the network 
may well bi considered the major aspect of the process of "under-^ 
standing" new input [p, 16], 

One of the obvious implications of the above view of reading is that it is 
normally a process which involves no output or retrieval mechanisms (see Carver, 
1971a). just like any other ordinary experience, while the event is happening 
there is usually no subprocess which codas certain parts so that they can be 
retrieved later Ci*©*! ramembered) with perfect accuracy. While experiencing 
or withesslng an automobile accident there is usually no subprocess which auto- 
matically goes about coding the facts that the insurance investigators will 
later regard as important, Tt is only after the event that we remember to execute 
these coding procedures ^ ^v^herwise we may not be able to accurately retrieve those 
parts of the event which insurance companies regard as Important, The information 
that ii stored during reading involves a similar expiriential process, i,e,, no 
automatic retrieval codes are built into the experience so that when these codes 
are later cued, certain aspects of the experience can be efficiently retrieved 
with near perfect accuracy. 

This view of the reading process, presented above, may be considered as an 
inefficient informatlon-prDeeiilng system. Yet, it can be argued that it is 
highly efficient. Unlesi an individual can predict quite accurately those 
selective aspects of his experience which will be deemed as important in the 
future, he will spind a great deal of his time axtenslvely coding for retrieval 
many parts of his experience that will never be useful to him in the future. 
This would be inifficient. 

Because reading is primarily an experiential event, with no automatic 
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coding of events using specific retTieval cues, it is difficult to ascertain 
whether a certain stimulus event has actually been exparienced or stDred, If 
all the important aspects of a reading experience were stored in inimory in 
certain locations identifiable by numbers, as is ordinarily the case with com- 
puters j then we could find out if a certain aspect of an event had been ex- 
periencad by asking for a retrieval of the information stored using the appro- 
priate code number. Such is not the case with reading. 

If the information that is stored during reading is not stored with specific 
retrieval codes, how dnos one go about measuring the priTnaiy effects of reading? 
In the following excerpt, Spinelli (1970) provides a hint of how we can probe 
the existence of this type of stored information: 

-It seems to be more economic to suggest that the basic structure 
of the memory system used by the brain is not addressed by location 
(location addressable) but by content (content addressable). What 
this means is that to retrieve a chunk of information all that is 
necessaiy is to provide the system with a fraction of the chunk, 
and the remainder will be played back [p, 295]. 

It appiars that one of the best ways of determining whether a chunk of 
the information contained in a reading passage was stored is to provide a part 
of the original chunk^ and see if this will cue the remainder of the original 
information. Now, wa are faced with how to actually put this theory into practice. 
The traditional r.se of multiple-choice questions makes use of this theory when 
it provides part of the original information in the question and the alternative 
answers, Howeveri as Anderson (1970) has noted, the traditional use of multiple- 
choice questions has other large disadvantages. There are usually no guide- 
lines or standards for^ (a) what type of .questions to ask or (b) how many 
questions to ask, Anderson (1970) has made reconunendations which provide solutiom 
to both of these problems. Hi states: ^^,,in order to answer a question based 
on a paraphrase, a person has to have comprehended the original sentence, since 
a paraphrase is related to the original sentence with rsspect to meaning but un= 
related with respect to the shape or sound of the words [p, 150]." TTius, the 
paraphraii suggtstion provides a solution to the problem of what type of question 
to aski and Mderson seems to have suggested, indirectly, a solution to the 
problem of how many questions to ask by mentioning the sentence as a i^it, Tliat 
iSj a paraphrase question could be written for every sentence in a passage. Or, 
a proportion of the sentences could be sampled and the results generalized to 
the population of sentences. 
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The major problem with the paraphrase question approach is that the quality 
of the solution depends upon the artistic ability o£ the e^perimentar , As such, 
it has the liabilities of any subjective s 'ution. Its replicability is highly ' 
questionable and large amounts of human resources are needed for implGmentation. 
TTie paraphrase approach may be labeled as an objective approacb since the answers 
to the questions may be deterained objectivily. Yet, the only part of this type 
of test that i^ objective is the scoring. The test development is subjective 
md to label such tests as objective tests is highly misleading* 

toother method which has been used and recomminded as a measure of the 
effects of reading is the cloze test (e.g., Bormuth, 1969a3 . Again, the primary 
advantage of the cloze procedure when it is used as a measure of reading compre- 
hension is that it is highly objective from a development standpoint. Scoring 
of this test usually involves some subjectivity^ but "this is minimal, Thm cloze 
test also provides an inherent solution to the problem of what type of question 
to ask. Cloze also seems to provide a technique which is compatable with the 
thaoiy of providing a part of the original information as a cue for playing back 
all of the infoCTiation, In most cloze tests, four- fifths of the original in- 
fomation is presented and one-- fifth must be played back [i.e,, every fifth 
word is deleted) . 

Although the cloze test has several advantages as a measure of the primary 
effects of reading, it does have an inherent disadvantage that needs to be dis^ 
cussed, If too much of the original Information is presented, the learner will 
process information from the test itself and be able to infer the original in- 
formation in its entirity without ever encountering the original information. 
Anderson (1972) discusses one aspect of the problem as followsi"TTie trick will be to 
devise techniques for constructing queitions that can be answered if a person has 
semantical ly encoded a commLmicatlon but not aniwered if it has been encoded 
only perceptually or phonological ly [p. 148.] TTius, the problem is to provide 
enough of a chunk of information to arouse the stored information ifj in fact, 
it has been stored but not provide so much information that the incorrect 
parts on the test c^ be recognized from backgroimd knowledge or from the in- 
formation presented in the test itself. 

The cloze test, as it is normally constructed, provides so much of the 
original information that individuals can correctly fill in many blanks without 
ivtr itoring the original information. Also^ when all of the information in a 
passage has bean stored, it is often inrpossible to recall the precise words 
used in the original passage. TTie above reasons explain why the cloze test 



has been shown to be relatively insensitive to the gain in information that 
accrues during reading (see Carver, 1973a, Coleman 5 Miller, 1968], TTie 
regul'ir cloze test seem^ to provide too much of the original chunk of information 
to be sensitive to low degrees of Infonnation stored and the nature of the cloze 
task itself is such that it is not sensitive to high degrees of information stored 

It is not a simple matter to determine the quantity and quality of the 
original chunk of information to be provided when testing for information stored. 
This general problem is explained in some detail by Spinelll (1970) who states: 

If wave forms In the brain represent stimuli, responses and the 
consequences of responses as we have previously seen (Pribam et. al,, 
19673, then presentation of the stimulus will generate a playback 
of the whole sequencei that is to say: recognition of the stimulus, 
the appropriate behavior that went with the stimulus, followed by 
the expectation of the consequences of the behavior. The amount of 
extra information obtained by the network or by the organism is 
greater, the smaller the segment of the total input string. The 
amount of uncertainty, and therefore of risk for the organism in 
using the sequence itself becomis, on the other hand, correspondingly 
greater. An analogy in the auditory mode helps in understanding the 
significance of this parameter. The name of a song followed by the 
playing of the whole song will, of course, be recopizedj if it has 
been heard before. The name of the song followed by half of the 
song will enable the listener to remember the remainder of the song. 
Ultimately, just the name of the song, or a few notes, will enable 
the listener to recall it entirely* But if the notes are too few, 
or if the name of the song is equivocal, then the level of match 
would.be correspondingly very, very small and might not enable the 
recaller to identify which song we are referring to. It might be 
that the fev/ notes provldsd are part of the beginning of many songs. 
Ideally, then, the acceptable match parameter should be set for that 
minimum value which allows unequivocal recognition of the stimulus 
with recall of the associati behavior and consequences of behavior. 

The trick then, is :to provide the minimum value of the match parameter 
which will trigger the retrieval of the orl|inally stored experience. If too 
little of a passage is presinted, then the original exptriince will not be 
reconstructed. I-^-too^^ch of the original passage is presented, it will be 
difficult to determine whether that which Is played back provides ivldence that 
the information was originally stored or whether the Individual was able to 
correctly infer most of the grlginal passage from the information he was pre- 
sented on the test. 

Th^ Dfiginal reading-storage type of test, suggiited by Carver C1971a), was 
conceived so as to solve the above problem as well as to provide a completely 
objective test. However, this first suggested form of the readlng-storagi test 
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was pilot ttittd and found to be only slightly better than th© cloze test in 
discriminating between those who had actually read a passage prior to taking 
the test those who had taken the test without every having an oppoTtunity 
to read the passage. Subsequent to this findingj a nuniber of different types 
of reading-storage tests were developed and pilot tested. An example of the 
type of reading-storage test which seemed to have the most potential is presented 
in Fig. 2 on the next page, 7hB passage on which the test is based is also pre- 
sented in this figure. The answers to the first few items, in Fig, 2, have been 
ciTcled as an aid to understanding the following description of the test. The 
reading-storage (RS) test consists of the original passage in capitalized form 
except for every other word; only the initial litter of every other word remains. 
And, out of each consecutive set of five such initial letters, one has been 
deleted and replaced with an incorrect alternative according to an algorithm 
Csee Carver^ 1973c). The task for the individual is to read the original passage, 
turn to the test and without referring back to the original passage, identify 
the one wrong initial letter of each set of five* 

The basic rationale underlying the RS test has already been presented, 
Howeverj there is one additional aspect of the test that needs to be diicussed. 
What about an idiot savant who could memorize the words without understanding or 
comprehending anything^ and still make a perfect score on the reading-^storage 
test* thus, erroneously indicating perfect storage, unders tiding, or comprehension. 
TTie frequency of occurence of such idiot savant^ is either zero or near zero so 
it does not seem necessaiy to be too concerned with this problem. However, the 
question still remains, to what extent does the simple memorisation of words 
erroneously inflate the scores on the RS test? Although learning, for many ex- 
perimental psychologists, is synonomous with memorization of words (e.g^j see 
King, 19713, others, such as Danks (19693, have argusd that the learning and 
compTehension process *'»,,are not nicsssarily isomorphic and the variables identi- 
fied as Important in one situation may have a minimal effect in the other [p, 696]*" 
The primaiy assumption undirlying the use of the RS test that is relevant to this 
memorization problem concerns a normal forgetting curve* Anderson (1972] has 
suggested that ■■,,^8. printed verbal stimulus is usually phonologically encodid 
and then, if it is to be remembered for more than a few momints, it is semantically 
encoded [p. 146],'* It is assumed that memory for orthographic and phonological 
cues do fade quickly and that the reading-storage test would be valid only in 
those situations where a certain minimum body of prose was tested under certain 
time limit conditions. The reading-storage test would be expectid to be valid 
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Exajnple Passage 



Tliis is our Post Office, It is in our city. Maiiy peoplo work here, There 
is a Post Office in every city in our country. And Post Offices in every country 
in the world. 

A Post Office helpor must be honest. He must bo a good worker. A Post Office 
helper handles lots of mail. A Post Office helper hfindles lots of money. 

TTie Post Office sends letters and packagos, magazines , and newspapers all over 
the world. It sends small imimals and plants, too. It sends money for us* It 
saves money for us. It puts money to work for us, too. 



Reading Storage Test on an Bxample Passiigc 
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only when an individual was given an amount of time to spend on a passage that 
approximated the amount required for normal reading (e.g., between about 100 and 
300 words per minute). If extensive amounts of time are given, then the test is 
likely to become invalid as a test of the degree of understanding or comprehension 
because memorization of words is likely to become a primary factor. Also, when 
the number of words m a passage drops below 100^ then it seems reasonable to 
beconia concerned about short term memory for words (i.e,, orthographic and phono- 
logical encoding) as a primary factor. Certainly, the reading-storage test would 
not be valid for the .comprehension of isulatad sentences, a primary concern of 
Anderson (1972), 

Thus, the use of the reading-storage test appears to be on a rationally 
sound basis when it is used to measure the understanding or comprehension which 
occurs during the usual processing (i*e,, reading) of prose. Stated differently, 
there seems to be an^le theoretical rationale to support the investigation of the 
reading-storage test as an indicator of the extent to which chunks of information, 
in the form of sentences, are stored during the normal reading of prose. 

It is assumed that there will never be a perfect measure of comprehension, 
understanding, or stored experiences. This is because the concept Itself is not 
precise enough to warrant any one perfect measure, and because better empirical 
indicants will only increase the probability of correctly detecting degrees of 
comprehension. As scores on the reading=storage test increase, the probability 
that comprehension occured also is assumed to increase. It does not necessarily 
solve the problem to make the criterion for comprehension more and more stringent, 
as Anderson (1970) seems to ,do. As the test becomes more and more stringent, 
the probability increases that a person who does well on the test did in fact 
store the information. However, while the test is made more and more stringent, 
the probability also increases that a person who did in fact store the information 
will be erroneously regarded as not having done this' because he did poorly on 
the test. The nianber of errors, both Type I and Type II, need to be minimized, 
and it is not appropriate from a strict measurement standpoint to maximize one 
type of error at the expense of the other type. 

The reading-storage type of measure seems to show a great deal of promise 
as an optimum indicator of the important and primary effects of normal reading, 
whether these effects be called understanding, comprehension, or stored in- 
formation. One of the puiposes of this research project is to investigate the 
validity of the reading-storage test, by studying the extent to which its purported 
theoretical validity can be supported with empirical evidence. 
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Set I Experiments 



Overview 

Purpose . Three separate experiments Experiment lA, Hxperinient IB, 
and Experiment IC v?ere conducted in the Set I ExpGriments . Tlie general 
purpose for all three experiments was to investigate the properties of the 
reading-storage test by comparing it to a form of the cloze test. The regular 
cloze test already been shown to be insensitive to the primary effects of 
reading (Carver, 1973a), but other forms of the test appeared to deserve full 
investigation. 

One way to evaluate the reading- storage test is to compare the amount of 
gain on the test with the amount of gain as estimated from the Ss own subject- 
ive e?^timates of degree of understanding. Subjective estimates of the under- 
standing of isolated sentences were used successfully by Schwartz^ Sparkman,^ 
Deese (1970) i'md Carver C1973a) found that such estimates were very reliable 
and extremely sensitive to the primary effects of reading. Carver found that 
these estimates approached zero when understanding would be expected to 
approach zero and they approached 100% when the accuracy of understanding 
would be expected to approach 100%, Tlius, it appears that the reading-storage 
testj as well as the modified form of the cloze test, can be evaluated by 
^lomparing its sensitivity with the sensitivity of the understanding judgments. 

Subjects . Forty^eight college students from the University of Maryland 
were paid to participate. The volunteers were recruited via an advertisement 
placed in the school newspaper, Ihm advertisement referred to an educational 
research project without explaining the nature of the experimentj i,e,, that 
it involved the administration of reading tests. 

Procedure and Instructions . The Ss were tested in 14 sessions ranging 
in size from three to four individuals per session. The general nature of the 
experiment was explained at the outset, The Ss were told that they^ (a) would 
recti V3 $6.00 in cash at the end of the four hour experimental session, (h) 
could receive an additional bonus depending upon their test performance^ and 
(c) would probably receive an average bonus around $6.00 with some individuals 
earning a bonus as low as $4.00 and some as high as $11.00 or $12.00. 
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Hie Ss were fuTther informed that: (a) the testing would bo conducted in 
thrco separate studies, (b) there would be a 10-minute break approximately every 
hour, and (c) there would be 22 short reading tests altogether. 

The Ss were told that the paragraphs they would be given to read would be 
approximately 100 words in length and that they would be given one minute to 
read each paragraph prior to taking the test on the paragraph. Tliey were told 
that if they finished reading a paragraph before the time limit expired, they 
should go back to the beginning and read it again, and keep reading until the 
buszer sounded. The Ss could ascertain the amount of time remaining by observing 
the timer which was located immediately in front of them. 

Tests p There were two types of tests a reading-storage type test = 
(RS-Test) and a modified- cloze type test CMC-Test). An example of the RS-Test 
has already been presented in Fig. 2, 

Tlie procedures for developing the RS-Test used in this research have been 
described in detail elsewhere CCarver, 1973ci* Briefly summarized^ the procedures 
are: (a) retype the original passage in capital letters with 10 words per line 
of running text, (b) for every other word, delete all letters except the initial 
letter of the word, and replace the missing letters with a designating symbol 
such as a standard dash^ and (c) for each line containing five of these skeleton 
type words, randomly delete one of these five initial letters and replace it 
with a different letter selected from the population of initial letters in the 
passage. Hie task for the S was to circle the wrong letter on each line of the 
test* The time limit for each RS-Test was 3 minutes* Pilot data indicated 
that this amount of time allowed 90 - lOOl of the Ss to finish the test, 

TTie MC-Test was a modification of the regular cloze technique* Instead 
of deleting every fifth word, as is usually done in the regular cloze procedure, 
every fourth, fifthj and sixth words were deleted and replaced with the initial 
letter of the word. Presented in Fig* 3 is an example of an MC-Test for the 
example passage in FigJ, The task for the B was to fill in the blanks with the 
correct words using the initial letters as cues. As an example, the correct 
words have been inserted for the first one-half of the test in Fig. 3. The 
time limit for the MC-Tests was seven minutes* Pilot data indicated that this 
amount of time allowed 90 - 100% of the Ss to complete the test. 
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Fig, 3, An example of a Modified-Cloze CMC) test for the example passage 
in Fig, 2. 
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Design . The ovtrall design for the Set I Experiments is presented in Table 1, 
There were two types of experimental sessioTiSj TVpe I and Type II. The Ss who 
participatod in the Type I session were administered RS-Tests in Experiment TA and 
MC-Tests in Experimeiit IB and Experiment IC. The Bs who participated in the Type 
il session were administered MC-Tests in Experiment lA and RS-Tests in Experiment 
\B and Experiment IC* The experimental passages were all the same in all three 
studies, only the type of test varied from the Type I session to the Type II session. 

Table 1 

Ovarali Design for Set 1 Experiments 



Session 



Type I Type II 



Experiment lA 


RS-Tes ts 


MC-Tests 






(N^24) 


Experiment IB 


MC-Tests 


RS-Tests 








Experiment IC 


MC-Tests 


RS-Tests 






CN=24) 



Scoring . The scores on RS-Tests were percent correct scores derived from the 
following equation' 

Number of items right minus one-fourth the number of items wrong 



X 100 (13 
Total number of items 



TTia percent correct score on the MC-Tests was determined by: (a) counting 
the number of correctly filled^in items (spelling errors were disregarded) (b) 
dividing by the total number of items, and (c) multiplying by 100. 

Data analysis ^ Since many of the score distributions were highly skewed, 
the average value for any particular treatitient- condition was istimated by the 
median* 
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Expe riment lA 



PurpQse p The primary puipose of Experiment lA was to determine how 
sensitive the RS-Test was to the primary effects of reading by administering 
the test under both reading and non-reading conditions. The sensitivity of 
the RS-Test was to b© evaluated by comparing it to the MC-Test and to 
understanding judgments. To determine the generality of its sensitivity, 
four different levels of passage difficulty were investigated. 

Subjects . ITiere were 24 Type I £s and 24 Type II Ss. 

Procedure . In order to familiarize the Ss with the task required by each 
test, they were given an example passage and an example test on the passage. 
The first one-half of the answers were already completed on the example test. 
With this example test and the original passage in front of them^ the Ss were 
instructed to try to figure out the answers to the remainder of the test. The 
E gave each S individual help as was neededj and the answers were graded so as 
to be certain that each S understood how the test worked. TTien, the Ss were 
informed that they wou'd be administered a reading passage and a test on the 
! passage for practicu. All of the regular experimental procedures were employed 

during this practice trial. 

Prior to the pressntation of the practice trial , an estimation procedure 
was explained to the Ss, Tliey were asked to estimate the number of complete 
thoughts in a passage that they understood immediately after they finished 
reading a passage and iTranediately before they started to work on the test on 
the passage. They were instructed that: (a] a sentence Is a complete thought, 
and (b) the estimate could be anywhere between 0 and 100% with 0 indicating 
\ that none of the complete thoughts had, been understood and 100 indicating that 

all of the complete thoughts had been understood. At the top of each test the 
I following percents were typed 0, 10, 20, 30, 40, SO, 60, 70, 80, 90, 100 

■ and the S was instructed to circle the one which best represented his own estimate. 

The Type I Ss were informed that they would receive 5^ for each correct 
answer on tadi test, i.e., the RS^Tsst, ^d the Type 11 Ss were informed that 
they would receive 1^ for each correctly filled- in blankj i.e., the MC-Test. 
TTie Ss were also informed that they would be given 10 tests, but that on 5 of 

: the 10 they would not be given the passage to read prior to taking the test. 

( 

i 

! 

\ 
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mc 
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Passages . 'Hie lOO-word passages used in Experiment lA were selected from among 
the 530 passages studied by Bormuth (1969b), The eight passages selected were 
chosen to represent four levela of difficulty as indicated by the RIDE Scale 
(see Carver, 1973d). The RIDE Scale is simply the number of letters per word 
(Ipw), The RIDE Levels are: Level 1, up to 4.0 Ipw; Level 2, 4.1 to 4.5 Ipw; 
Level 3, 4.6 to 5.0 Ipw; Level 4^ 5,1 to 5.5 Ipw; Level 5, 5.6 Ipw and above. 
Two passages were chosen to represent each of Levels 1-4. 'Ilie two Level 1 
passages were randomly selected from those passages with RIDE values of 3.7 and 
5.8. Levels 2-4 were chosen from those with RIDE values of 4.2 6 4.3, 4.7 

6 4.8, and 5.2 5 5.3, respectively. After selectioni the eight passages were 
divided into two sets^ A and with one passage at each level in each set, i.e., 
passages lA, 2A, 3A, 4A, IB, 2B, 36^ 5 4B. Two extra passages were chosen to 
be used as additional practice passages^ one was at Level 1, IP and the other 
at Level 3, 3P. ' ^ 

Design . Table 2 presents the design for Experiment lA. The experiment 
was designed to investigate the degree to which reading the passages affected 
the test scores at each of the four difficulty levels while controlling for: 
(a) differences between individuals, (b) within-level differences between 
passages, (c) practice^ and (d) order of presentation. 

Practice was controlled in two ways. First, as already noted, the Ss were 
administered a practice trial. Second, the first two tests, of the ten, were 
also regarded by the E_ as practice. In Table 2, it may be noted that the first 
two tests were exactly the same for all Ss. The Bs were paid a bonus on the 
basis of these tests but these data were not analyzed, 

' There are two primary Latin-Squares embedded in Table 1. Ss 1 - 4 were 
presented the A set of four tests under the reading condition (R) in one Latin- 
Square, and the B set under the non-reading (R) condition in another Latin- 
Square. Both of these four-by-four Latin=Squares were completely counter^balanced 
for immediate sequential effects (see Bradley, 19SS) . For the £s 5 - 8, the 
two primary Latin-Squares were reversed so that the tests taken by £s 1 - 4 
after reading were presented without reading and the tests taken by Ss 1 - 4 
without reading were presented after reading. 

The within level differences between passages were controlled by using two 
different passages at each level for eadi condition reading and non-reading* 
To minimiEe the differences between individuals, the design in Table 2 was repli- 
cated three timesi i*s,, N^24, Also, the. entire design was completed twicei once 
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for the RS-Tests, Type I Sessions , and once for the MC-TestSj Typo TI Sessions. 
Tlie order of testing alternated between 8 Type I Ss and 8 Type 11 £s , 

Table 2 
Design for Experiment lA 



Order of Presentation 



Type of s 


1 


2. 


3 


4 


5 


6 


7 


8 


9 


10 




1* 


R* 


R 


R 


R 


R 


R 


R 


R 


R 


1 


IP 


3P 


3B 


4A 


IB 


2A 


2B 


lA 


4B 


3A 


2 


IP 


3P 


4B 


3A 


3B 


4A 


IB 


2A 


2B 


lA 


3 


IP 


3P 


2B 


lA 


4B 


3A 


3B 


4A 


IB 


2A 


4 


IP 


3P 


IB 


2A 


2B 


lA 


4B 


3A 


3B 


4A 


5 


IP 


3P 


4A 


3B 


ZA 


IB 


lA 


2B 


3A 


4B 


6 


IP 


3P 


3A 


4B 


4A 


3B 


2A 


IB 


lA 


2B 


7 


IP 


3P 


lA 


2B 


3A 


4B 


4A 


3B 


2A 


IB 


8 


IP 


3P 


2A 


IB 


lA 


2B 


3A 


4B 


4A 


3B 


* R is the re 


ading condition 


and 


R is 


the 


non 


-reading 


condition. 



Order effects were controlled in two ways also* Firsts the two Latin 
Squares for the reading and non-reading conditions controlled for order within 
each condition. Second^ the four reading tests given without reading were 
alternated with the four given after reading, as is indicated in Table 2 by 
the R and R at the top of each column in the table. 

Results and Discussion . Figure 4 contains the understanding results for 
both the RS-Test group and the MC-Test group. The data points in Fig. 4 
represent the median, percent understanding rating for the 24 values at each 
difficulty level* Notice that the two curves are practically coincident 
suggasting between group repllcability and between group comparability. One 
tempting generalization is that the difficult passages seem to be more difficult 
to underst^d^ but this should not be inferred. Although each passage was 100 words 
long and the Si were given on© minute per pass age ^ the average rate at which 
the passages were presented was still not equal across difficulty levels. 
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Fig. 4. Understanding as a function of difficulty 
for the RS-Tsst Group and the MC-Test Group, Experiment lA CN*48} * 

This is because the more difficult passages contained longer words. It has been 
shown that control for the physical length of the words tends to flatten out curves 
similar to those in Fig, 4 (sm Carver, 1972b] Miller and Coleman, 1972), TTie 
Fig* 4 data can be subjected to an apprDpriate control procedure by calculating 
the efficiency of thoughts stored in standard thoughts per minute. When this 
control procedure is accomplished, efficiency turns out to be around 5 stpm 
4*5, 5*3, S.Sj 5 4.6 stpm for both groups combined at the four levels of 
difftculty, with no evidence of a monotonically decreasing trend* These data lend 
support to the theoiy that individuals process information from prose with the 
same degree of efficiency as long as the prose is not at a difficulty level which 
is higher than their ability level Carver, 1973b), The data in Fig. 4 

suggest that the accuracy* of understanding decreases with increases in the diffi- 
culty of the paragraphs, but when the differing passage presentation rates were 
controlled, it was found that the efficiency of storing thoughts was relatively 
equal for all four difficulty levels, _ 
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Fig. S presents the test scores on both the RS- and MC-T©sts for 
both the reading and non-reading conditions. Notice that under the reading 
condition, the two curves are almost coincident. The MC-Tost curve under 
the non-reading condition is somewhat erratic. Both tests seein to 
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Fig, 5, Test scores as a function of difficulty level 
for the RS-Test and the MC^Test under reading and non-reading 
conditions. Experiment lA CN^48] * - 



reflect about the same amount of average gain from the reading to the non- 
reading condition. Table 3 presents the gain in percentage points from the 
non-reading condition to the readifig condition for both tests at all four levels 
of difficulty. The mean gain for the RS-Test was 38,2 and for the MC-Test 
was 31,8, TTie standard deviation of-the MC-Test was almost twice as large. 
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Table 3 

Percentage Point Gains and Effigiency Ratios in Experiment lA 
far each Difficulty level on the RS-Test and MC-Test 



Percentags Point Gain 



Test Score 



Understanding 



Efficiency 
Ratio 



RS - Test 
Level 1 
Level 2 
Level 3 
Level 4 
Mean 

Stand, Dev. 



45 
42 
32 
34 

38,2 
5.4 



96 
95 
85 
70 

85.5 
10.5 



.47 
.44 
,38 
.49 
.45 
.04 



MC - Test 
Level 1 
Level 2 
Level 3 
Level 4 
Mean 

Stand* Dev. 



22 
SO 
25 
30 

31.8 
10,9 



94 
91 
87 
63 

83,7 
12,2 



.23 
.53 
.29 
.48 
.39 
.13 



10.9^ as the RS-Test, 5.4. Thus^ the RS-Test appears to be more sensitive 
and more consistent^ i,e*^ less variable, than the MC-Test to the primary 
effects of reading. 

Table 3 also contains the understanding gains that accon^anied each test 
score gain. Since understanding was ^ero, by definition, under the non-reading 
condition, these values are the same as those plotted in Fig. 4, In order to 



assess the relative sensitivity of the objective test scores ^ the test score 
gains have been divided by the understanding gains to produce an efficiency 
ratio* The ratios for each level of each test are as presented in Table 5, 
The efficiency ratio is also the slope of the regression of test scores upon 
understandings and as such is a type of validity index. If the slope, i*e., 
the efficiency ratio, is zero, tho test scores would be considered as 
completely invalid since they would seem to be insensitive to the 
primaiy effects of reading. If the slope was perfgctj 1.00* the test score 
would be considered as perfectly valid since the test was Just as sensitive to 
the primaiy effects of reading as is the most sensitive indicator known, 

Tlie mean efficiency ratio for the RS-Test^ .45^ was slightly higher than 
that for the MC-Test, ,39, Also, the standard deviation of the MC-Test, .13, 
was more than three times greater than the RS-Test, .04. TTiuSj this index 
also seems to suggest that the RS-Test is more sensitive and more reliable than 
the MC-Test, 

These gain data in Table 3 and Fig, 5 can be directly compared to the 
gain data collected by Carver C1973ai. TTiese comparisons can be made because.* 
(a) the Carver C1973a] data used paragraphs with an average RIDE Scale value of 
5.05; (h) the gain associated with this 5,05 difficulty value can be found by 
inteipolating between Laval 3 (about 4*8 on the RIDE Scale] and Level 4 (about 
S.3 on the RIDE Scale) in Table 3 and Fig. S; (r) the rate of presentation of a 
paragraph with a RIDE Scale value of 5,06 can be estimated to be around 113 
standard words per min. (swpm) since th© average rate was 107,5 swpm for Level 
3 and 118,2 for Level 4| and (d) the gain, between non-reading and reading at 
113 swpm can be found in the Carver, (1973a) data, iniis gain between non-readin 
and reading passages at a 5,06 difficulty level and 113 swpm caji be calculated 
for all three of the measures used in the Carver (1973a3 data., i,e., chunked, 
regular cloze, and revised cJoze, as well as for the RS-Test and the MC-Test, 
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These percentage point gains are presented in Table 4. The highest percent- 
age gain was associated with the chunked test, Howeverj it should be noted 
that the chunked test was not developed in a standard objective manner as wore 
the other four tests. It was developed by empirical revision procedures de- 
signed to produce items which reflect this type of gain, so it should not be 
sui^rislng that it is the most effective in this regard. Of the four remaining 
objective test development approaches -- three forms of cloze aiid the RS-Test 
the RS-Test had the most percentage point gainj 33. The three cloze tests 
were formed by systematically deleting blanks which had to be filled in with the 
correct words. The traditional cloze test was formed using the every fifth 
word deletion pattern, and the revised cloze test was formed using an every 
fourth and fifth word deletion pattern. Also, the MC-Test was formed by using a 
fourth, fifth, and sixth word deletion pattern with the initial letter of the 
deleted words remaining. 

Also included in Table 3 is the percentage gain in subjective estimates 
of understanding associated v^?ith each type of tast^as well as the efficiency 
ratios. It should be noted that all of the understanding estimates should be 
equal because, theoretically^ the passage difficulty and presentation rate was 
equal in each ease. In fact, these valuas all are approximately equals varying 
only from 70 to 77%. Notice that the chunked reading test has high validity ^ 
,80, while the RS-Test is next highest with an efficiency ratio of *43. These 
data suggest that the RS-Test is more than twice as valid as the regular close 
test, .19^ as a measure of the primaiy effects of reading. 

When compared to the chunked test^ the RS-Test is much less sensitive to 
the primary effects of reading. However, this comparison is not appropriate 
because the RS-Test did not undergo any einpirical iteration procedures de- 
signed to make it highly sensitive to gain, as did the chunked test. It is 
likely that an RS-Test on a particular passage could be made much more sensitive 
to the effects of reading if it was revised on the basis of empirical gain data, A 
more appropriate comparison for the ^-Test is the cloze test, because both of 
these types are objectively developed in a standardized manner. Also, when this 
comparison is made, it seems evident that the RS-Test is much more sensitive* 
Only the Modified Cloze test that was used in the present study appears to come 
close to being as sensitive to the primary effects of reading as is the RS-Test, 

One result that deserves notice is the non-reading data in Fig, S, More 
items can be guessed at correctly without reading on the less difficult passagiS 



Table 4 

Percentage Point Gains and EfficienGy Ratios for Five 

TVpes of Tests 



Type of Test 



Percentage Point Gain 



Test Score 



Understanding 



Efficiency 
Ratio 



Caryer (1973) Data 

Ch linked 
Cloze 

Revised-Cloze 



61 

14 
10 



76 
74 
70 



,80 
.19 
.14 



Experiment lA Data 

RS-Test 
MC-Test 



33 
23 



77 
75 



,43 
.37 



as comparad to the more difficult passages. TTierefore^ it would appear to be 
impossible to accurately predict the amoimt of infoTination that was stored 
Cor percent of understanding) from the absolute size of the RS-Test score. 
No doubt, there are also large individual differences with respect to this 
ability to correctly gusss answers. An absolute prediction of under- 
standing or information stored using RS-TiSt scores should take into account 
the score that would be expected without reading ^given the difficulty of the 
paragraph and the ability of the individual. 

Experiment I^ , 

Pui^QSe . TTie purpose of Experiment IB was to investigate the properties 
of the RS-Tast by experimentally manipulating the amount of information present 
Infonnation was mraipulated by varying the percent of the words In the reading 
passage which were deleted. Againi the RS-Test results were compared to the 
MC-Test and the imderstanding judgments. 



Subjects > 'niere were 20 Ss in the Type I sessions and 20 |s in the 
Type II sessions. 

Procedure and Instructions . The £s were told that they would be given 
six tests in this experiment and that the type of test would be different 
from the previous experiment, Tliey were then given the same example paragraph 
as in the first expiriment with a different exmple test. If the Ss were in 
the Type I session^ they were given the exMiple MC=Test, If the Ss were in the 
Type II session^ they were given the examplt RS-Test, Again, help was given 
to those individuals who had difficulty figuring out how the test worked. 

Hie Ss were administered a practice trial using exactly the same practice 
passage as was used in the first experiment. This time^ however, eveiy fourth 
word on the passage had bein omitted. The Ss wtrs informed that they would read 
six passages one would have eve%y sixth word omitted^ one would have every 
fifth word omittad, one would have every third word omitted, one would have eve^ 
second word omitted^ and one would be administered with all words omitted (i.e.j 
the non-reading condition) * 

Those Ss taking the RS-Tests were Informed that they would receive 5^ per 
each correct answer/ and those Ss taking the MC-Tests were informed that they 
would receive 1^ for each corrict answsr. Just as in the first experimtnt, the 
Si were instructed to indicate their understanding estimates by circling one of 
the values at the top of each test. 

Tests, The reading paisagis were taken from the five passages used in 
Form A of the Carvir=Darby Chunkid Raadlng Test, The five experimental passages 
were developed by counting the first 100 words of each passage^ and then completing 
the passage at the end of the sentence containing the 100th word* 

The RS- and MC-Tests on each passage were developed using the procedures 
outlined in Experiment lAi 

Five different experimental conditions were developed for each reading 
passage, TTie 0% condition conslited of the paiiaga without any of the words 
deleted, TTie 17% condition was formed by dglating ©very sixth word* The 
33% condition was formed by deleting every third word* The B0% condition was 
formed by deleting every second word* And, the 100% condition was formid by 
deleting all words , the non-reading condition* TTia deletions were made by 

applying a whit© covering substanc© (used by typlits to maki manuscript 
corrections) on the words* TTie teiti w©re reproduced by xerography thus leaving 
no clue as to th© words omitted except for word length* 



The practlca passage for Form A of the Carver-Darby Chunked Reading Test 
was used to develop one test. This test was considered by the as practice 
and was prtsented under only one deletion condition, every fourth word 
deleted (25%). 

Design . Table 5 contains the design for Experiment IB. This design was 
employed to investigate the effect of varying passage deletions upon test scores. 

Table 3 
Design for Experiment IB 



Order of Pregentation 
Group 1 2 3 4 . S 6 



PC2S%) 


1(0%) 


2(33%) 


3(100%) 


4(17%)" 


5(50%) 


PC24%) 


2 (17%) 


3(S0I) 


4 (0% ) 


5(33%) 


1(100%) 


PC2S%) 


3(33%) 


4(100%) 


5(17%) 


1(50%) 


2 (0% ) 


PC2B%) 


4(501) 


5 (0% ) 


1(33%) 


2(100%) 


3(17%) 


PC25%) 


5(100%) 


1(17%) 


2(50%) 


3 (0% ) 


4(331) 



Practice effect was controllad by regarding the first test taken by the 
Ss under actual exjarimental conditions, as additional practice. That is, 
the first test was always the same for all £s, the practice passage (P) given 
under the 25% condition* The remaining five tests w^re presented in a Greco 
Latin-Square desigii to control for passage differences^ order differences, and 
individual differences, 

Eaah session containing a group of 3 or 4 Ss was administered tests in one 
of the five groups noted in Table 5* The five groups were tested in order but 
alternating between MC-Tests (Type I) and RS-Tests (Type II) groups. The dasign 
was complited with 20Si (4 Ss par group) for each type of test* Since there 
were 24 Ss in thi Type I SfSSions and 24 Ss in the Type II siisions, one group 
of four Ss of each type was an unanalyied replication. 

Results and Discussloni i Fig. 6 contains the understanding results for 
both the RS -Test and MC-Test groups, As in Experiment lA, the data points are 
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the median percent understanding ratingi for the 20 values' fpr each percent 
deletion condition* 

The two curves have the same general shape, even though they are not 
nearly as coincident as were the understanding curves in Experiinent I A, These 
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Fig. 6* Understanding as a function the 
percent of passage deletion for the RS-Test Group and 
the MC-Test Groups Experiment IB (N-40), 

data do suggest reliability for the underitanding 'Variable plus comparability of 
RS- and MC-Test groups. 

For the RS- and MC- Test groups combined, there is a decrease in percent 
undeTStandlng from around 81% in the 0% condition (undeleted) to around 35% 
in the 50% condition. The 81% understanding estimate for the undeleted condition 
is about the same as was obtained for the same paragraphs, in the previously 
mentioned, Carver C1973a) study. That is, the average rate at whidh these five 
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paragraphs ware presented was about 121 swpm^ and the understanding estimate 
at 121 swpm in the Carver (1973a) study can be interpolated to be about 74% 
for all three test groups, TTiis result further supports the reliability of 
the understanding variablSj and replieability of the results in both experiments* 

Fig, 7 contains the percent correct scores on both the RS- and MC-Tests, 
100 
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Fig. 7* Test scores as a fimction of the percent of passage 
deletion for the RS-Test the MC-Test, Experiment IB CN^403 . 
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Both curves have the same general shape which suggests that both measures are 

sensitive to the decrease in information presentad* TTie RS-Tgst had a 

much larger decrement than the MC-Tast between the 0 and 50% deletion conditions. 

The effi ci in cy ratios for the gain between the non- reading and reading 
conditions , 
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are »34 and ,44 for the RS- and MC-Tests, Tespectively, In this experiment, 
the MC^Test appears to be more valid than the RS-Test. How ever , since both 
tests seemed to be measuring approximately the same thing in. Experiment lAj 
it does not appear reasonabla to interpret the small differences between these 
two efficiency ratios as being real nor should the differences in the two curves 
in Fig. 7 be interpreted as real* These differences are more likely to have 
resulted from uncontrolled within-individual differences* It does not seem 
likely that the RS-Test is the mora sensitive to decrements between the 0% and 
the 50% deletion conditions while the MC-Test is more sensitive to decrement 
from the 50% to the 100% deletion condition. It seems more likely that both 
tests are measuring the same thing except for chance variations. The most 

representative curve showing the relationship between test scores and 
percent deletion could be found by combining the data from both the RS- and 
MC-Test curves in Fig. 7. 

In sumniary, both the RS-Test and the MC^Test appear to be sensitive to 
decrements in the amc mt of information presented. The RS-Test appeared to be 
slightly more valid in Experiment lA and the MC-Test appeared to be slightly more 
valid in Experimsnt IB, Ilia RS- and MC-Tests appear to be approximately equal 
with respect to reflecting the primary effects of reading, 

Experiment IC , 

Purpose , Tlie purpose of this e^eriment was to investigate the effect of 
forgetting upon the RS-Test scores. Forgetting was m^ipulated by varying the 
numbar of passages and tests which were interpQlated between the riading of a 
passage and the subsequent administration of the test on the passage, 

SiJsjects , niere were 24 Ss in the Type I sessions and 24 Ss in the Type II 
sessions. 

Procedure . The Ss were informed at the outset that they would be administered 
six tests during this last experiment/ The type of tests were to be exactly the 
same as those in the preceding experiment. They were also informed that they would 

be given an entire paragraph to readi as was the ease in Experiment lAs but they 
would not be administered the tests immediately following their reading of the 
paragraphs. Instead, they were told that they would be given the entire set of 
pasiagss to read before they started taking the tests on the passages* They were 
also told that the order of the tests would be reversed so that they would take 

the test on the last paragraph they read, firsts and so on* The last 
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test, the sixth test, would be one that would be administered without prior 
reading of the passage, i.e., the non-reading condition. 

Again, those Ss receiving the MC-Tests CType I session] were told that 
they would be given 1* for each correct answer, and those Ss receiving the 
RS^Tests CTVpe II session) were told that they would be given for each 
correct answer, . ° 

Tm£, The six, 100-word passages used in Experinient IC were developed 
from the six passages contained In Forni B of the Carver^Darby Chunked Reading 
Test according to the same procedures outlined in Experiment IB, The RS- 
and MC^Tests were developed from the experimental passages according to the same 
procedures as were outlined in Experiment IB, 

Des^. Table 6 contains the design for Experiment IC, The 6 by 6 design 
for the tests is a Latin^Square completely counterbalanced for immediate sequential 
effects. Tlis passage presentation order is the reverse of the test order except 
for the last testj instead of being given Hrst, the passage for the last test 



Table 6 



Design for Experiment IG 



Passage Order 



Testing Order 



Group 


1 


2 


3 


4 


5 


1 


2 


3 


4 


S 
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4 
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was not presented at all, Tliui, the Latin^Square controls for passage differences 
and individual differgnces while reflecting the effect of interpolated activity 
upon test scores. Test 1 represent zero (0) interpolated activity since there 
was nothing interpolated between the last passage that was read and the first 
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test which was on the last passge. Test six, 6, represents infinite (^) 
interpolated activity since the passage had not been read prior to taking the 
test, i.e., the non-reading condition, 

Hiere were four Ss per each of- the six groups in Experiment IC, N^24, The 
study was replicated for the MC-Tests and the RS-Tests, thus niaking the total 
of 48 Si for Ixpsriment IC. Th% Bix groups were tested in order, but alter- 
nating between Type I and TVpe II sessions* 

Results and Discussion . Fig, 8" presents the RS- and MC-Tsst scores as a 
function of the degrie of interpolated activity. The two curves are almost 
perfectly parallel, Tlie RS-Test appears to be slightly more difficult than the 
MC-Test, ^as was also the case in both E^eriments lA 9 IB, Hie MC-Test reflects 

100 



75 




e RS-Test and ,MC-Test Combined 



01 r I I I I 

0 1 2 3 4 

AMOUNT OF INTERPOLATED ACriVITY 

Fig, 8, Test scores as a function of the amount of interpolated 
activity for the RS-Tist and the MC-Test and the combination of icores 
from both tests, Experiment IC (N^483 , 



more of a drop from no Inteipolatad activity and it has a flatter plateau. 
Again, however, It seems more reasonable to interpret variations bfetween thtse 
two curves as being most likely attributable to uncontrolled within-individual 
variations. Again, the most reprtsentatlve curve shape for both tests is 
probably the one formed by combining the data from both groups. TTie combined 
curve is also presented in Fig, 8. Th&te appears to be a very slight drop in 
test scores with a small amount of interpolated activity but little drop there- 
after, given the amounts of interpolated activity studied in this experiment. 

It should be noted that the two curves were parallel in spite of the fact 
that the actual quality and quantity of intei^olated ictivity was not the same 
in both groups. Both groups were given the same amount of time to read the para 
graphs, but the RS-Test group received only three minutes to circle answers 
while the MC-Tsst group received seven minutes to fill in blanks with words. 
The actual amount of time between the reading of the first paragraph and taking 
the test on it was about 19 min. for the RS^Tests and about 35 min, for the 
MC-Tests. Since the interpolated activity was highly similar to the criterion 
task, there is probably not much, if any, forgetting ^sociated with 
either test during at least a days time. In absolute terms, there appears to 
be about a drop of 8 percentage points due to interpolated activity^ and this 
leaves a 20 percentage point residual over the score that could be expected 
without reading. The efficiency ratio can be used to inteipret the amount of 
forgetting that occurs with interpolated activity. iTiat is, an efficiency ratio 
of about ,40 can be used to convert a test score loss of 8 percentage points 
into a loss of 20% in accuracy of standard thoughts stored. Therafore, it may 
be said that these £s experienced about a 20% loss in information stored due 
.,..,Jj the forgetting that occurs with interpolated activity (or the passage time) 
but they still rimimbertd about 50% (20/, 40) of the original information after 
a great deal of inte:^olatid activity. 

These data can be interpreted as favorable for the use of the RS-Test. 
If the scorts had dropped shai^ly with any interpolated activity or had dropped 
to that of the non-reading condition with these anounts of intirpolated activity, 
then the data would not seem to have been reflecting infoimation stored. Instead 
\:he RS-Test would have been reflecting an immediate , or ihort tenn, memory for 
wordi, Tliise data suggeit that the RS-Test is indeed reflecting the type of 
information that is stored during normal riading because this type of information 
does seem to fade illghtly immediately after it is itored, but it certainly 
does not fade to anywhere near zero within the same day that it was stored. 
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CQncluslons and ImplicatlQns . 



From these data it may be concluded that tHa reading-storage type of test 



is valid as an indicator o£ the prima^ effects of reading. None of the various 
cloze techniques seem to be more valid than the rsading-storage test while spme, 
including the regulaT cloze procedurs^ appear to be considirably less valid. 
The modified cloze test seems to be more valid than other varieties of the 
cloze procedure, especially the regular cloze test, and it appears to be about 
as valid as the reading-storage test, Ihereforej the reading-storage test 
appears to be at least as valid as other objectively developed types of test. 
Since cloze tests are not 100% objective, l.e,, they must be scored subjectively, 
the reading-storage test would seem to be preferred over cloze tests as a 
measure of the primary effects of reading * 

Two questions about the validity of the reading-storage test remain to 
be explored. First, how does it compars to the highly recommended, but sub- 
jectively developed paraphrase test? Second^ just how much is the reading- 
storage test influenced by the short-term memorization of words? The for- 
getting curve data in Experiment IC suggested that the reading -storage test 
was sensitive to the primary effects of reading as opposed to the memorization 
of words, but more and different evidence is needed here. The next set of ex- 
periments will provide further evidsnce relevant to these aspects of the 
validity of the reading-storage test* 
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Overview . 

Purpose . Thare wert three prlmar/ purposts in the Set II Experiments: (a) 
to determine the extent to which the RS-Test is influenced by the memorization 
of words, (b) to cQmpare the RS-Test to the paraphrase test, and (c) to determine the 
effect of programmed prw^e upon learning from prose materials under low and 
high motivation conditions. 

Tliare were four experiments relevant to these pui^oseSp One experiment 
was relevant to the first puiposa. All four experiments were relevant to the 
second purpose. There was also one experiment relevant to the third purpose. 

Subjects, Fifty-eight college students were paid to participate. Again, 
as in the Set I Experiments^ the volimteers were recruited from the University 
of Maryland via an advertisement in the school newspaper. Again^ the ad- 
vertisement referred to an iducational research project without explaining the 
nature of the ixperimant, i,©,, that it involved the administration of reading 
tests. 

Procedure and Instructions > Hiers were five experimental sessions with 
the size of each group ranging between 11 and 12 individuals each* At the outset^ 
the Ss were told that: (a) they would be paid $10,00 in cash at the end of 
their four hours of participation, (b) there would be three 10-min, breaks during 
the afternoon, (c) they would be taking a number of short reading tests, and 
(d) their scores on the tests would be mailed to them. After thsse general 
instructions J a set. of standardized tests were administered followed by Experi- 
ment I lAjB^^eriment IIB^ Experiment nC § Experiment IID. 

Standardized Tests . All 58 Ss were administered two standardized testSi 
the Basic Reading Rate Scale CBRRS) and the Reading Level 4 CRL=43 test. These 
two tests were administered primarily to provide a control over individual 
differences in Experiment lib, TTie BRRS is a published test which measures the 
rate at which easy prose can be read under conditions which control for compra* 
unnsion, It provides for the categorization of readers into four typis 
Beginning Readers, Good Readers, Better Readers j and Best Readeri. 

"Hie RL=4 is an unpubliihed test which.was developed by applying the reading- 
input technique to fivei lOD-word paragraphs at the college level of difficulty, 
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i.e., at RIDE Livsl 4. The Ss were instructed to t^ to get as many right as 
possible in as short a length of time as possible. The primai^ score on this 
test was an efficiency score, i,e. , number .correct Ccorrected for guessing) 
divided by the time required in min. 

Scoring and Data Analysis . TTie scoring and data analysis In the Set 11 
Experiments were accomplished in exactly the same manner as in the Set I 
Experiments except for the programined prose experiment, Experiment IIB, Tlie 
scoring and data analysis for Ixperiment IIB is explained in detail later. 
The scoring of the paraphrase test was performed in the same manner as the RS-Test. 

Experiment IIA . 

In troduction , ThQ priniary purpose of this experiment was to investigate 
the extent to which the RS-Test reflects the primary effects of reading as 
opposed to simply the memorization of words. 

In order to achieve this purpose, it seemed desirabli to be able to manipu- 
late information stored, understanding, or comprehension under conditions wherein 
memoriEation for words should remain constant. Bransford and Johnson [19^2) 
have recently developed two passages which seem to permit this type of experi- 
mental manipulation. When these passages are administered with context cues, 
high comprehension supposedly results, but when these passages are administered 
without context cues little or no cpmprehension results. For example, when a 
picture is viewed briefly, prior to reading, individuals seem to have no trouble 
in comprihending what they read, HoWfV#r, individuals who are given the same 
passage without the opportunity to view the picture seem to comprehend little 
of what they read. 

One way to test the validity of the RS-Test would be to administer it under 
both conditions, i.e., with and without contixt cues. If the RS-Test Is primarily 
reflecting the ability of individuals to memorize words, then there should be 
little or no difference between the RS-Test scores under these two conditions 
since the opportunity for, memorization of words is equal. If the RS^Test is 
primarily reflecting the normal effects of reading, then there should be a large - 
gain when the context cue is given, 

Experim©nt IIA used thai© two specially developed passages to investigate 
the sensitivity of the RS-Test to ths prima^ effects of reading. Furtheraore, 
paraphrasi test CP-Test] questions also were developsd on these same passages 
so that the validity of the RS-TiSt could be compared to the validity of the P-Tsit. 
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Subjects > There were a total" of 48 in this experiment. TTiese Ss 
ware the first 48 of the total of 58 who participated in the Set II Experiments, 

Procedure and Instructions , TTie Ss weTe given instructions ^ exan^leSi 
and practice on the^ (a) RS-Test, (b) the P-Test, and (c) the imderstanding 
judgments. These procedures were the same as those employed in the Set I 
Experiments except that in this experiment all the £s learned how to take both 
types of tests before they were actually given either type of testi Also, the 
Ss in this experiment did not have the opportunity to practice taking a test 
under the non=reading condition* 

Passages. Passage A had a picture as a context cue. The passagi 
contained 132 words and 9 sintences, and was at RIDE Level 3 difficulty, 
4*6 letters per word* Passage B had a two word title for a context cue, 
The passage originally contained 181 words md 15 sentences. However, to 
make it comparable in length to Passaga A, the last 4 sentences' were deleted* 
Thus, Passage B contained 136 words and 11 sentinces, and was at RIDE Level 2 
difficulty* 4, 4 letters per word. These two paragraphs actually had very close 
difficulty estimates* 4,6 and 4,4* with Passage A being at the bottom of Level 
3 and Passage B being at the top of Level 2* 

Tests . The KS-Tests were developed for the two passages according to 
the standard algorithum discussed in the Set I experiments. 

The P-Teit quistions were developed according to the recommendationi given 
by Anderson (1972), Andarson defined two^ statements as being paraphrases of 
one another if (a) they have no substantive words (nouns, verbSj modifiers) 

in conmion Bnd (h) they are equivalent in meaning [p, ISO] J- Anderson then states 
that to form a test item from a paraphraie, '^ , * you delete an element, which, 
is to be supplied or identified by the student* or you transform a segment of 
the paraphrased statement into a question [p. 151]." These rules of Anderson's 
were followed closely in the initial stagei of item development but insurmountabl 
difficulties developed. The level of redundancy between sentences in a passage 
is such that E found it to be exceedingly difficult Capproaching impossible) to 
develop a paraphrase item for each sentence which thought S_ could not get 
correct without ever reading. It was suapfcted that Anderson had used his rules 
only for isolated sentences instead of prose passages. The paraphrase guidelines 
of Anderson were modified by writing S-choice* multiple'-choice items on each 
sentence* The question and correct alteniative represented a paraphrase of the 
original sentence whereas the incorreGt altamatives resulted in meaning changes 
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Des^m* There were two different passage conditions [Passage A and 
Passage B) ^ and there were two different test type conditions [RS-Test and 
P-Test) * There were three diffefent reading conditions Context^ No Context, 
and Non- Reading. In the Context condition^ either the picture (for Passage A) 
or the title (for Passage B] was given prior to reading the passage. In the 
No Context condition, neither of the two context cues were given prior to 
reading* In the Non-Reading condition^ the test on the passage was given with 
no opportunity to see the context cue or read the passage. Finally^ there 
were two different orders of presentation^ first or second, and this constit- 
tuted the fourth factor that was manipulated. 

There were total of 24 different treatments (2x2x3x2), The first . 
24 Ss to participate in the five sessions wgre administered iach of tht 24 
treatments j and the second group of 24 Ss were administered each of the 24 
treatments again. Tlie final 10 Ss were also administered these treatments, 
but their data were not analyzed. 

Time Limits and Control Pages , THb passages and tests were assimbled into 
a booklet with a cover page. The Ss were given 15 sec, to look at the first 
page in their booklet* For the Ss in the Contsxt conditionj this was either 
a picture or a title. For the in the No Context and Non-Reading condition . 
this was a control page containing brief instructions to the Ss to keep their 

i 

eyes on their own tests and not to turn the page until they were told to do so. 

The Ss were given one min. to look at the next page* For the S_s in the 
Context and No^Context conditions, this page contained either Passage A or B. 
For the Ss in the Non-Readinf condition » this page contained directions to 
simply sit quietly until the time limit was up, since they would not have the 
opportunity to read the passage befors taking the test on it. 

The test followed the second page. The Ss were given three minutes to 
work on the test and this seemed to be ample time for about 90 - 100% of the Ss 
to finisha 

After the test on the first passage was completedj the 15 sec^ 1 min,^ and 
3 min* time limits were repeated for the second passage treatments. 

Results and Discussion , Fig. 9 con tains a plot of the mfdian values for 
the under^'^tanding data under the No Context and Context conditions. All of the 
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Ss had been instructed to circle a zero understanding estimate under the Non- 
Reading condition so thess data have not been plotted. Each data point in Fig* 
9 represants eight Ss* Tliere is curve for each test-passage treatment^ i.e,, 
RS-^Test Aj RS-Test P-Test A, and P-Tast B, Notice that there is an increase 
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Fig* 9, IMderstanding as a function of the 
context condition for Passage A and Passage B as estimated 
by individuals in the RS- Test Group and the P«Test Group, 
Experiment IIA (NMS) , 

of about 30 percsntage points in the understanding ratings for three of the four 
treatments. There was little or no gain associated with the P-Test B treatment, 
i*e., 7 percentage points , TTies© data suggest that there was something different 
about the individuals in each of the two groups representing the two data points 
for P-Test This rating was made prior to taking the test so that there should 
be no differenci, ixcept for the uncontrolled individual differincesj between 
this curve and the RS-Test B curve* Taken collectivily, the four lines in Fig, 9 

support the findings of Bransford and Johnson (1972) which Indrcate that context 

/ " " ■ ... . ■ . 

'42. \ . 



cues are impoTtant for comprehension or understanding. 



Fig. 10 contains a plot of the median values for both tests on both 
passages under all three reading conditions. Each data point in Fig, 10 also 
represents eight Ss. It may be noted that the gain due to context, i.e.. 




Fig, 10. Test Scores as a function of reading eondition 

for Passage A and Passage B as measured by both the RS-Test and 
the P-Test, Experiinent IlA CN-48). 

from the No Context to the Context conditions , was substantial for the RS-Test, 
i,e., 26 and 29 percentage points for Passages A and respectively, T^is 
finding is even more impressive when it is compared to the corresponding percent- 
age gains for the P-Test, 34 and -20, respectively. The -20 gain for the P-Test B 
condition should be discounted because the understanding data in Fig. A indicates 
that the con^rehention of the two groups involved in this gain score was not 
equivalent to the other groups prior to taking the tests. These data suggest 
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that the RS-Test does reflict the primary effects of reading and reflects 
these effects just as well if not better than the paraphrase type of test. 

The gain from the Non-Reading to the No Context condition might have been 
interpreted as being primarily due to memorization were it not for the fact 
that the gain on the P-Test was so substantial. Both the P-Test A and the 
P-Test B reflected apprOxiiTiately the same percentage point gains from the Non- 
Reading to the No Context conditionj i^e,^ 34 37, respectively. The corres- 
ponding girns for the RS-Test were 42 and 6j respectively. The average gain 
for the P-Test was 3S.5 and the corresponding value for the RS-Test Is 24.0. If 
thi gain from Non-Riading to No Context is primarily due to memorisation, then 
the paraphrase type of teat seems to be more influenced by memorisation than 
does the reading-storage type of test. 

Thm pricadlng analysis and interpritation of the data is summarizid by the 
efficiency ratios prasented in Table 7, TTiese data represent the average test 
score gains and understanding gains for thi two psssagiS (k and B) cornbined. 
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Table 7 

Percentage Point Gains and Efficiincy Ratios for Experimant IIA 

Percintagt Point Gain 

ifficiancy 

Test Score Understanding Ratio 



RS-Test 

Non-riading to No Context 24,0 52. S .46 

Non-Reading to Context SI. 5 83 ♦ 5 .62 

P-TiSt 

Non-rtading to No Context 35. S ' 66.0 .54 

Non-reading to Context 42.5 84,1 .50 



Notice that the efficiency ratio for the Context condition was higher for the 
RS-Test, .62, than it was for the P-Test, .SO, md the efficiency ratio for 
the No Context condition was lower for the RS-Test, 46, than it was for the 
P-Tist, *54, Thus, these effiGiency ratios also suggeit that the RS-Test is 
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mora sensitive to the primary effects of reading than is the P-Test, and it is 
less sansitive to the effects of word memorisation than is the P-Test, 

TTie evidence from this exptriment further supports the validity of the 
RS=Test as a measure of the primary affects of reading. There is no evidence 
that scores on the RS-Tests are influenced significantly by the memorization of 
words. It seems to be just as sensitive to the primaiy effects of reading as 
is the paraphrase test, 

The experiment which followed this experiment was Experiment IIB* However, 
Experiment IIB will not be priSented next because its primary purpose was not 
to compare the RS- and P-Tests, Experiment IIC and Experiment I ID, which 
follows^ were designed to coinpare further the RS-Test and the P-Tist. 

Experiment IIC , 

Introduction , The primary purpose of this experiment was to investigate 
the sensitivity of the RS-Test to the primaiy effects of reading as compared to 
the P-Test, Again^ as in Experlmant lA^the differences between reading and non* 
reading paragraphs at four different levels of difficulty were Investigated* 

For the RS-Test^ this experiment was almost an exact replication , of 
Experiment lA. TTiis time, however, an additional dependent variable was added* 
The passagis wtri typed in all capital letters with sentence ending punctuation 
and spacing cues omitted, and the £'s task was to indicate where he thought the 
Sintences ended by placing an X mark there, as he read the passage* If the S_ 
could not mark correctly the sentence ending pimctuation, then he could not be 
expected to have stored the information, Hiis task provided objective evidence 
regarding the comparability of the individuals after reading and before they 
took either the RS- or P-Test. 

Subjects . As in Experiment lA, there were a total of 48 Si who partici- 
pated In this experiment, TTiese Ss were the first 48 of the total of 58 who 
participated in the Set II Experiments. 

Procedures and Instructions , The procedures and instructions were exactly 
the same as in i^^eriment lA with the following exceptions: (a) the £s received 
instructions and practice refarding the placement of an X mark between sentences 
as they read the passages; aid (b) each £ was administered both types of tests 
without knowing which type would follow while he was reading a particular passag 

During the experiment which precadid this ixptrimtnt, i*e., Experiment IIB, 



the £s were informed that they would reciive bonuses for their performanee 
during the reniainder of the session. For this experiment. Experiment IIC, 
they were told that they would receive 2$ for each sentince they marked correctly. 
This scoring system was explained to them as follows: "If you fail to mark the 
end of the sentence or^ if you place an extra X mark somewhere else in thi 
Sintenci^ the stntence will be scored as incorrect and you will not receive 2^ 
for it,'^ The Ss were also told that they would receive 2$ for tach answer they 
marked correctly on the 10 tests. 

Passages . TTie passages wert exactly the same as in Experimint lA except 
they had been retyped entirely in capital letters with sentence ending punct- 
uation and spacing cues omitted. 

Tests* nie RS-Testi , were exactly the same tests as were used in Experiment 

TTie P-Tests were developed. by £ on each passage using the same item de- 
velopment guidelines as were dtscribed in E^eriment IIA. 

Design. TTie design of Experiment IIC was exactly the same as Experiment 
lA^ except for one difference. Instead of each S receiving only one type of 
test, i.e. 5 an RS-Test group and a P-Test group, each £ received both types of 
tests according to the design presented in Table S, 

Table 8, 
Design for Experiment IIC 



^ Order ^ 

IP 3P 12545678 
R* R* ^ R ^ R ^ R ^ R 



gg** p** p Rs RS RS P P RS P 
p RS RS P P P RS RS P RS 



* R is the reading condition and % is the non=reading condition. 
** RS signifies the RS-Test and P signifies the P-Test. 
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Again^ theri were 10 tests altogether but the first two, IP and 3P, were 
exactly the saini for all Ss in each of Groups A and B and they were regarded 
by the E_ as practice. Tlie remaining 8 tests were varied according to the same 
Latin-Square design as was used in Experiment lA except that one group of Ss, 
Group A, received a different type of test on each passage at each corresponding 
order position as compared to the other group. Group B. The R and ^ in Table 
8 signifies the reading and non-reading conditions respectively* 

TTie first 8 £s tested were placed in Group A, the second 8 Ss tested were 
placed in Group 6, and this alternating between Group A and Group B continued 
for each consecutive set of 8 £s until the data froni all 48 Ss had been collected. 

Results and Discussion , Fig. 11 contains the niedian percent of the 
sentences marked correctly at each level of difficulty for those Ss who were 
subsequently administered RS-Tests md P-Tests, If the two groups and two sets 
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m 
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Fig* 11. Percent of sentences marked correctly as a function 
of difficulty livsl for those individuals who were administered RS-Tests 
and P-Teits, E^qperiment IlC CNMS), 
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of conditions were iquivaient, then these curves would be perfectly coincident^ 
anrfj the two curves were almost perfectly coincident. For tht first three 
levels of difficulty the individuals who took the RS-Tests scored slightly 
higher than those who took the P-Tests* At the highest level of difficulty. 
Level 4, the RS-TiSt individuals scored slightly lower than those individuals 
who took the P-Tests, Hiesi results indicate that around 80 - 90% of the 
sentences could be determined correctly by the SSj no matter what the difficulty 
luvel. 

Fig. 12 presents the median percent understanding estimates at each level 
of difficulty for those who took the RS-Tests and those who took the P-Tests, 
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Fig. 12, Understanding as a fimction of difficulty 
level for those individuals who were administirid RS-Tests and 
P^Tests, Expirimant IIC (NMS) , 



The RS-Test individuals reported almost the same degrfi of undei-standing on 
the first three levels, but at Level 4 the P-Test individuals reported a 
somewhat higher understanding estimata. These data parallel the 
sentence data in Fig. lljexcept the interaction at Level 4 is somewhat 
greater for the understanding judgments than it was for the Sintfnce marking 
tas1<* Tliese data in Figi 12 essfntially replicate the corresponding data from 
Experiment lA presented in Fig. 4. Understanding in this experiment seems to 
be slightly below that of Experiment lAj as would be expected from the fact 
that in E>^eriment IIC the Ss had to figure out where the sentences ended while 
they were reading, 

TTie data from Fig, 11 and Fig, 12 indicate that after reading and prior to 
taking either the RS- or P-Tests^ the individuals at Levels 1-3 were almost 
exactly equivalent with respect to storing the information contained in the 
passiages, Tlie RS-Test individuals seemed to have understood slightly more 
thsn the P-Test individuals for Levels 1 - 3, At Level 4 the P-Test indi- 
viduals seemed to have understood more. 

Fig, 13 contains the RS-Test arid P-Test scores for the four levels of diffi- 
culty and under both the reading and non-reading conditions. Under the reading 
condition^ the two curves art roughly parallel, TTieir largest difference is at 
Level 4 where the P-Test individuals scored much higher than the RS-Test individuals, 
but these data are in keeping with the diffirences found earlier in sentence 
marking (Fig* 11} and understanding estimates (Fig* 12). IMder the nQn-reading 
condition, the P-Test scores are highly erratic from level to level. About 44% of the 
Level 2 questions could be answered without ever reading the passages while about 
0% of the Level 4 questions could be answered without reading them. The RS-Test appears 
to present the most consistent results, although on the average the P-Test 
appears to reflect the most gain due to reading, TTiis impression from Fig* 13 
is supported by the gain data for the four levels presented in Table 9, TTie 
average gain for the P-Test is S5.0 percentage, points and the average gain for 
the RS-Test is 37.0 percentage points. TTierefore, the P-Test seems to be slightly 
more valid, on the average, than the RS-Test for reflecting the priTnary .effects 
of reading, in this experiment. However^ the standard deviation of thf P-Test 
gains, 1S,1, was almost twice as large as the RS-Test, 9.7. Thus, the RS-Test 
appears to be more consistent, i.e., reliable, lliis intirprstation also is 
supported by the efficiency ratio data in Table 9. 
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Fig, 13, Test score as a function of difficulty level for 
the RS-Test and the P-Test under both riadlng and non-reading conditions. 
Experiment IIC (^-48), 



The average efficiency ratio was higher for the P-Testj ,73j as compared to the 
RS-Test^ *SL However^ tht standard deviation of the four P-Test efficiency ■ 
ratios^ *24^ was three times greater than the standard deviation of the 
four RS-TeSt ifficiencv ratios, ,08* 

TTiese data may also be con^ared to the data in Table 4 ^ i.e,, the efficiency 
ratios for a hypothetical passage having a difficulty index of 5.06 being read 
at 113 swpw. In this experiment^ these efficiency ratios ware iS4 and ,91 for the 
RS« and P-Tests^ respectively, TTie effieieney ratio for the RS test^ .54, was 
somewhat higher than it was in Experiment I A, ,43, Tlie efficiency ratio for the 
P-Tist^ ,91, was even higher than the chunked test, ,80, and It approached the 
ideal value, 1,00. 
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TTiese data reflect the item writing ability of the E for the P-Test 
as well as the nature of the two tests. It appears that E did not do a good 
job of writing the Level 2 items. However, since the E used the same techniques 
as guidelines for all tests, these data also reflect the unreliability of this 
subjective method. 

The RS-Test results for the non-reading condition represent an almost perfect 
replication of the Experiment lA data* These data are presented in Fig. 14, 

This result was expected because these conditions were exactly the 

Table 9 

Pircentage Point Gains and Efficiency Ratios in Experiment IIC for 
each Difficulty Level on the RS-Tests and P-Tests 



Percentage Point Gain 



Efficiency 

Test Score Udderstanding Ratio 



RS-Tast 

Level 1 '37 96 ,39 

Level 2 49 87 ,56 

Level 3 40 67 ,60 

Level 4 22 47 .47 

Mean 37,0 74.3 ,51 

Stand, Dev, 9,7 18.9 .08 

P-Te - 

LdVil 1 64 87 ,74 

Level 2 29 82 ,35. 

Level 3 ' 61 ^ 63 .97 

Level 4 66 76 .87 

Mean 55.0 77*0 ,73 

Stand, Dev* 11. 1 9.0 ,24 
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same between the two experiments, These data indicate on© dimension of 
passage difficulty^ i,e., as passagis become less difficult, as measured by 
the RIDE Scale, more items can be answered correctly without riading. 
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Fig, 14, RS-Test score as a function of difficulty level 
under the non-reading condition for Experiment IA(N^48) and 
Experiment IIC CN-483 . 



Hiis relationship appears to be approximately linear. 

Conclusion . • 

From this experiment it would appear that the P-Test is^ on the average, 
slightly more sensitiva than the M-Test to the prima^ effects of reading but. 
the RS-Test appears to be more reliable than the P-Test, 
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Introduction. The purpose of this experimtnt was to compare the RS-Tost 
and the P-Test under conditions wherein the information stored from a passage 
was directly manipulated. This experiment was a replication of Experiment IB 
with the following exceptions: (a) Instead of comparing the MC-Test with the 
RS-Test, the P-Test was compared to the RS^Test, and (b) the reading passages 
had been typed in capital letters with sentences ending punctuation and spacing 
omitted so that the Bs had to mark the sentences as they read. 

Subjects. TTiere were 40 Ss who participated in this experiment. TTiese 
Ss were the first 40 of the total of SS who participated in the Set II Experiments. 

PrQcedures and Instructions . TTie procedures and instructions were exactly 
the same as in Experiment IB with the ixception that both the RS-Test Group and 
the P-Test group were informed at the outset that they would mark sentences as 
they read passages in a manner similar to what they had done in the preceding 
experiment, i.e.. Experiment IIC. 

Passages. The passages were exactly the same as in Experiment IB except 
they had been retyped entirely in capital letters with sentence ending punctuation 
and spacing cues omitted. TTie deleted words ware replaced with stajiiard length 
dashes so that the word length cue was not available in Experiment IID as it was 
in Experiment IB. 

Test£. The RS-Tests were exactly the same tests as were used in Experiment IB. 
The P-Tests were developed by the E on each passage using the same item develop- 
ment guidelines as were described in Experiment IIA. 

Desi|n. The design of this experiment was exactly the same as Experiment IB 
except the P-Tost replaced the MC-tests. 

Results and Discussion . Fig. IS contains the median percent of sentences 
answered correctly as a function o£ the percent of the passage that had been 
deleted. For the RS-Test grotp there was a 63% decrease in the percent of sentences 
answered correctly between the no-deletion (0% condition) and the every other 
word deletion CSO% condition), and this decrease was .almost perfectly linear. 
Also, this result was almost perfectly replicated by the P-Test Group, S6% decrease. 
These data indicate that infonnation stored was being directly manipulated by the 
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Fig, 1S» Percent of sentences marked correctly as a 
function of thi percent of paisage deletion for the RS-Test 
Group and the P-Test Groups Experiment HD CN^40) * 

deletion treatment, and that both the RS- and P-Test groups were approxi- 
mately equal prior to being administered the test. 

Fig. 16 presents the median understanding percents as a function of 
the degree of deletion. There is a 49% drop in percentage points for the 
RS-Test Group between the 0% condition and the 50% condition. There is a 
corrisponding S0% drop in the P-Test, and the P-Test group data are almost 
perfectly parallel to the RS-Test group. Howiver, for these data the P-Test 
group reports consistently higher understanding ratings, about ISI higher. 
It appears that there may be somewhat of a "halo^^ tjTpe of effect for the 
understanding estimates on thiP-Ttst, 
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Fig. 16. Understanding as a function, of the percent 
of passage deletion for tht RS-Test Group and the P-Test Group, 
Experiment LID CN=40) . 

Even though the P-Test group reported that they understood about 1S% more 
while they were marking less of the sentences correct, the differences between 
the two groups are relatively small, Hius, the data in Fig, IS and Fig. 16 may 
be interpreted as indicating that the two groups were approximately equal with 
respect to the information they had storid. If there were differencts between 
the two groups, they were approximately equal at each Isvil^ thus having no 
differential effect upon the leneral shapes of the RS-Test and P-Test curves. 

Fig. 17 presents the RS-Test and P-Test scores as a function o£ the degree 
of deletion. The two curves are almost perfectly coincident except for tht 
S0% deletion condition where there is a differencf of 22 percentage points. 
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For the Other four points, the average difference is only about 7 percentagi 
points. The P-Test appears to present a more consistent decrease from the 
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Fig, 17. Test score as a function of the percent of passagi 
deletion for the RS-Test and the P-Test, Experlinent I ID (N^40), 

0% deletion condition to the S0% deletion condition. However^ the score on the 
P-Tist at thi 50% condition equals that of the 100% condition, and this result 
is inconsistent with the data in Fig* IS and Fig* 16 which indicated that some- 
thing greater than zero had been stored under the 50% deletion condltiQn, Thm 
efficienGy ratloi for the reading versus thi non-reading condition were as 
follows: RS-Test^ ,53; P-Test^ .28, Hiese data appear to provide no clfear-cut 
support for the superiority of the P-Test over the RS-Test, 

As in Experiment IB ^ the cUfferences in Fig* 17 are probably due to randomly 



distributed, within-individual differences, thus making the most representative 
curvtj the one formed by cortining the data from both groups. This averaging 
was accQmplishid for both Experiment IB and Experiment TID, and these data are 
presented in Figurt 18* TTie almost perfect eoincidtnce in the two curves 
iupporti the hypotheili that all of three of the tests RS-Test, HC-Test; 
and P-Test are reflecting approximattly the same thing. The differences 

100 

^ ^ ^ Experiment IB 
^ ©--^ — Q Experiment II D 




0 17 33 SO 100 
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Pig* 18. Test ic© res as a function of the percent of pasiage dtletlon for 
the RS- dnd MC-Test data combined in Experiment IB and for the RS- and P-Tast data 
cQiAined in Experiment IIDi 

between the eurvei In Fig. 7 and Fig* 17 are therefore moit reasonably attributed 
to randdm within individual variance that U best eontrolled by large sample 
ilics^ The relatively coniiitent drop in icores from the 01 to thi SOI deletion 
Qonditidn li coniisttnt with the two understanding curves (Fig, 6 and Fig* 16) and 
the sentence Mrking curve (Figi ISJ* The drop of approximately 50 percentage 
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'pDints in undirstanding is ajsociated with about a 20% drop in the objective 
test SGoreSi i.e.^ the afficiincy ratio for all three of these tests, is about 
,40 in theie two deletion experiments, • 

Conclusioni. Whin Information itored is directly manipUlatid by deleting 
words from passages, the RS-Tist -^nd tht P-Test appear to be approxiniitely equal 
in reflecting these changes in the primary effects of reading. 

Experiment I IB .' 

IntrQduction . The primary purpose of this experlmtnt was to investigate 
the effect of prograinmed prose upon iriformation stored under two levels of 
motivation, low and high. It was hypothesized that programnitd prose would 
facilitate the amount of information itored, i.e., amoimt learned, from' difficult 
material under a low motivation conditionibut be an inhibitor under a high 
motivation condition. . 

The hypothesis was tested by developing programmed prose materials on a 
lengthy and relatively difficult passage taken from a journal article on reading 
research* Tlie regular passage and the programmed prose was then administered 
to college itudints under the two motive-incentive conditions': (a) low the 
Ss were given the prose materials, and instructad to read the material carefully 
so that they could do their best on the test that would be administerid after- 
wards and (b) high the £s were Informed that they would be paid a bonus for 
each question they answered correctly on the test afterwards , TTiere was also 
a eontrol group which was administerad' the test questions without reading* This 
control group was then given the regular prosei and the test was administered 
again afterwards. TTieii control data were collected so as to be able to interpret 
the scores from the experimintal groups on an absolute scale vatying from what 
iiiight be expected under conditioni where scores on thi test should be minimal and 
maximal . 

Subjects . TTii Ss were the entire set of 58 college students who participatid 
in th© Set 11 Experiments, 

Design . TTie overall design for the experiment is presented in Table 10, 
One group (A) received the regular prose under the low motivation condition. 

Another group (D) received the programmid prose under the low motivation condition. 

A third group (C) received the regular prose under the high motivation condition| 

andr* a fourth group (D) receivad the programmid prose under the high motivation 
condition. 
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A fifth group [Control) jnot represented in Table 8, took the test under 
two different treatment conditions — prior to reading the passage and after 



Table 10 



Design for Experiment IIB 



Motivation Condition 



Low . High 



Regular Prose Group A Group C 

Programmtd Prose Group B Group D 



reading the pas sage. These two treatniints were referred to as Pre-Control ^d 
Post-Control, 

There was only one treatment administered per session^ i.e,^ one group was 
tested each day for five consecutive days. The order of the five groups were 
as follows: D and Control/ 

Part of the success of the manipulation of motivation depended upon the Ss 
in Group A and Group B being naivs with rispect to the ixperiment. If^ for 
example^ a person in Group B, the first group, informed a friind in Group A, 
the second group, that he would be paid for how wail he did, the person in Group 
A would actually belong in Group C. Since the subject pool was extrimely large, 
it was not likely that such coniniunlcation between groups would occur over a 
20 hr* period. .4 Howeveri three factors were implemented to facilitate better 
control over this problim, Flritj the Ss in group B were informed, at the end. 
of the sessioni that all groups participating would recieve 

different bonus conditions and Wire aikid not to divulge the nature of the ex- 
periment because it might have an adverse effect upon a friend*s score. Second^ 
the Ss in Group A were administered a questionnair© after they had bien paid at 
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the end of the experin]©nt,in-quiring^ as to whsthsr thiy had received prior 
in fonnation about the bonus systim, None admitted this knowledge. Thirds 
the regular prose group^ Group A i was tested sicond so that if there was 
commimlcation between individuals in the two groups, it would benefit the 
regular prose group and thus tend to provide evidence against the hypothssis 
being tested. 

Procedures and Instruetions , At the outset^ every group was informed that they 
would be given a long passage to riad, and given a 100-item test on the passage 
aftarwards. TTiey were told that the test included 50 RS-Test items and 50 
P-Tist it gms similar to those they had taken in the prsciding experimentj 
Experiment lA, 

TTie Ss were told that the passage was somewhat like a test Itself since 
there was an item every fifth word* Ttie prograniniid prose goups ware told to 
look at the first page of tht passage and note that an X had already been placed 
in the first boXi TTiey were instructed to mark an X in every box on the remainder 
of the passage. The regular prose groups were told, to note that all of the X's 
had already betn marked in the correct boxes ^ and that they should tiy to read 
the matirlal in a normal manner ^ disregarding the incorrect altemativei. 

All groups were told that thty would ftceive about 29 minutes to read the 
passage/ and ;.*at if they finished early they should go back and read the passage 
again* A clock was provided indicating the amount of time remaining. All groups 
were also told that the scores they made on the 100-item test would be mailed to 
them. The high motivation groups were also told that they would receive 2(t per 
each correct answer on the 100-item test. 

After the initial instructions were given, the £s were instructed to begin 
reading and the clock was started. Hie £s were actl^lly givin 28 minutes and 
42 seconds to read (an average rata of 12 programmed ^rose items ptrmin.) , 
After-the time limit was up^ all the groups were glveA the tiSt and told that: (b) 
they would have 40 minutes to finish^ (b) once they hadXfinished a page and. 
turned to the, nixt^ they could not turn back suid work on^ previoui page, and (c) 
they should pace thimsilves by ipending about 3 minutes pei^ page and by keeping 
themsslvis informed of the time remaining* \ 

Tlie low motivatidn groups wfrs Informed at this point, i\i after reading 
but prior to taking the test, that they would be paid a 2f bonusVfor each correct 
answer. .Ilius, the motive- incentive conditio exactly the s line for all 

groups during thf administration of the lOO^item test* 
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Ttiirs was an additlontl difference between the programmed prose and the 
rigulai? prose groups under the high motivation conditions. TTie programTned 
prose group was also Informed at the outset that they would be paid 1/2$ par 
each correct answer on the programined prose materials* 

For th© Control Groip thi entire sgquence of ivants was explained at the 
outset, l,e,, the pre-tast, the reading, and thi post-test. The Control Group 
rtcsivid regular proie, not programmed prose. This group also knew at the out- 
set that they would receive 24 per each correct answer on both the pre-test and 
the pQS^t-test; 

Passage . Tlie passage was selected to be readable but difficult for collegs 
students* It consisted of the first 1720 words from a journal article about a 
computer model of reading (see Carver, 1971a) , Hie programmed prose for the 
passage was developed according to the procedures given" for the reading- input 
technique (see Carver, 1973c)* There were 344 programmed prosf? items, 1*6,, one 
item per each five words of rimning texts The Fig, 1 aKample is an excerpt from 
the actual programmed prose. ITie passage was typed in all capital litters, i,e,, 
simulated computer outputj with 20 items per page* Hie items were in a vertical 
column justified at the right-hand side of the page^ i*e. j there were four words 
and a two choice item per line making 20 lines per page. The title of the article 
and the three section subtitles were given, and paragraphs were signified by 
skipping a line. TTie passage was 19 pages long. 

The RIDE Level difficulty of the pass^age was detiirmined to be 5.3, i*e*^ 
Level 4, using the procedures described by Carver (1973d), 

Ttie passage for the regular prose condition consiBtid of the prpgrammed 
prose passage with the correct answers to the items already markidt 

Tests. One P-Test item was written for each of the first 10 sentences : . 
in the 1720 word passage. Then 10 RS=Teit items were developed starting with 
the next sentence. After the 10 RS=Test, items were completed, 10 P-Teit items 
were written for the subsequent 10 sentences. This alternating between 10 P-TiSt 
and 10 RS-Test items continued imtil there were SO items for tach test. The 
development of these 100 Items was the factor which determined the exact length 
of the pasiagi. 

The P"TiSt. itims. were developed accordini to the procedures outlined in 
Bxperiment IIA and the RS-Tist Items were developed according to the procedures 
outlined in Experiment lA, 
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TTiere were 10 RS-Test items per page and 5 P-Test. items per page, making 
a total of IS pages on the 100=itim test. 

Data AnalysisV Tlltrfe art largi differences between Individuals with 
respect to their ability to store the information containsd in prosi materials. 
Also, 11 or 12 individuals per group is not ntarly enough to control for these 
diffirincis. Thus, the two standardized tests which were administered at the 
outiet provided a means for control over such differences, i.e., the Basic 
Reading Rate Scale and the RL-4 test, The data were analyzed twicej using each 
standardized test as a control. 

The first analysis involved the Basic Reding Rate Scale, which provides for 
the categorization of readers into four types— Beginning, Good, Bettirj and 
Best. In this experiment there were individuals In each of the three higher 
livils Good, 18] Betttr, 36; and Best, 4, However, the distribution was net 
the same betwean the five treatment groups. For example^ one group had a frequency 
distribution of 0^ 6, S and another had a frequency distribution of 2^ 7^ 2, 
In order to be able to generalise the results to the most representative group 
of col lege studanti; the sparse data from the Ss in 'the Best and Good categories 
were deleted. Thus, one data analysis consistad of the mean Scores of the Good 
Readers who participated in each treatment. 

The second analysis involved the RL-4 testi which providid an efficiincy of 
riading score (number right corrected for guessing, per minute]. The linear 
regression of the test scores Ci.e^i RS-Tsst and P-Test) on the RL-4 scores was 
computed for each of the six treatm Then, the average slope, b, for the 

six treatments was found, and this value was used to adjust the mgan test scores 
for group diffarences in reading ability. TTie above procedure was performed 
twica, once for the RS-Test and once for the P=Test. 

Results . Both progranmid prose groups were able to.input the passage at 
a 90% or above level, i.e., the scores ranged from 90 -- 991, Tlie median score 
for the low motivation, programmed prose group, Group was 98*5 and the corres- 
ponding value for the high motivation groupi Group D, was 96.9, Hi is hi gh level 
of performance iuggests that the^ difficulty level of the prose materials was not 
, greater than the ability levels of all the Ss in all five groups. 

Table 11 contains the means and stiuidard di^^ for each of the six 

treatments for each of the two types of tests, RS^Test and P-Test, and for both 
data analyses, BRRS and RL-4, 



Tablt 11 

Means and Standard Deviations for the Six Treatme: 
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than the s..il„, standard deviation, 2.9, a oonsiderable difference. 
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and Post-control conditions .ere administered so .s to be able to inte^re, the 
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Table 12 
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Fig* 19* Ef fectiviiiess of Regular Prose and Programnied Prose 
as measured by the RS-Test and the P-Test under low and high motivation 
CDndltlons for the BRRS analysii 
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Fig. 20, Bffectiveniss of Regular. Prose and Progreimed Prose as 
measured by the RS- Test and the P-Test under low and high motivation 
conditions for the RL-4 analysis (N^SS), 
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Discussion . Tliese'data support the hypothesis that programmed prose acts 
as a faeilitator of attention and^ as such ^ indirectly facilitttes leaniing 
under conditions wherein attention wanes. The results were just as had been 
expected in this regard* When students were asked to read difficult material 
under conditions wherein there is very little incentive to attind to the task 
at hand^ programnied prose was shown to increase the amount of inforination stored* 
On the othir hand, when itudents were highly motivated to leaimj i*e,, they were 
being paid on the basis of how well they performad, programmed prose acted as an 
inhibitor as comparid to regular prose. It appears that the use of programmed 
prose as a manipulator of attention and learning deserves further investigation. 

Although the above results supported the interaction that was hypothesiied, 
thari are several factors that tend to qualify the result. TTie results held 
up imder two drastically different analyses uid two drastically different types 
of tists. However, the limits of this generalisation need to be tested in other 
ways, For exampls, will these rssults gtniralizs to lower ability individuals 
who are presented lower difficulty materials? Or|' will these results generalize 
to a condition wherein many of the Ss do not have time to finish the programmed 
prose and go back and read much of the material again/ as thsy did in this ©x- 
periment? Furthermore^ a close inspection of these data evokes a puzzling 
question* Why did the high motivation individuals do worse than the low motivation 
Individuali whin they were administered programmed prose? It would seem that the 
two high motivation groups should have done better than the two low motivation 
groups. Instead, the high motivation group which received. r prose, stored 

slightly less information than the low motivation, programmed prbss group, : These 
results could be inteipreted .as indicating the potency of the programmed pros e^ 
but the potency seems to interact with the motive-incentive conditions in 
an imexplp,inable manner* Rather than trying to explain these data with an 

anxiety factor^ i,e,, the high incentive groups became anxious a;nd lost ifficlencyj 
it seems more prudent to determine whether these data can be replicated. / 

Finally^ it should not, go unnoticed that the near-perfect replication of 
RS-Test results and the P-Test results provides another bit of evidence for the 
validity of the RS-Test as a measure of the primary effects of reading. There 
is another result, in this regard, that supports the reliability of the RS-Tost. 
In ixperimint IC, the gain from non-reading to the forgetting plateau was 20 
percentage points for a group of passages which. averaged about 5.1 on the RIDE . 
Scalep In this: experiment, Experiment MB/, t^ non-reading to reading ; 

the SiS difficulty. level passage, a roughly comparable comparison, was 19 percentage 



points in the BRRS analysis and 15 percentaga points in the RL-4 analysis. It 
appears that the RS-Test will provide roughly Gomparable results on an absolute 
scale as long as the difficulty level of the passages and the ability Isvsl of 
the individuals aro controlled* 

Conclusion . Tliese data supported the hypothesis that prograinnied prose is 
;a manipulator of attention and theraby facilitatis leabling from prose materials 
under low motivation conditions i and inhibits learning under high motivation 
conditions. However^ since those individuals who receivtd programiTisd prose 
materials did better under low motivation conditions than they did imder high 
motivation conditions J it seems prudent to conclude only that more should be 
learned about efficacy of progranmied prose and its interaction with motivation, 

Summaiy and Conclusions 

Phase I ' 

Tlie puipoie of Phase I was to investigate the validity of the riading=storage 
test; as a measure of the primal effect of reading, whether the primary effect 
is called learning, information stored, understanding, or compreheniion. Six 
different experiments were conducted which were directly :felevant to this purpose. 

In the first three es^eriments, the reading-storage test was compared .to a 
modified version of the cloie test. Validity was investigated in one experiment 
by.inviitigating the sensitivity of the tests to reading prose, as compared to 
not reading, at four levels of difficulty. In another experiment , the effect of 
reading was manipulated by delating various pei'cents of the words from prose, and 
the tests were conpired with respect to the mea^iurement of these changes* In 
the third experiment, the sensitivity of the tests to forgetting was studied, 
-The reading-storage type of test appeared to be just as valid if not more valid 
than the modified cloze type of test as a measure of the primary effects of 
reading. TTii cloze procedure represents the only other technique besides the 
reading-storage procedure which is completely objective from a test development 
standpoint. Also > none of the cloze techniques appear to be more valid than the 
reading-storage technique, while* the reading-storage technique appears to be 
about twice as valid as the regular cloze technique. Since the reading-storage test 
appears to be just as valid as the most valid version of the cloze test, since 
the readini-itorage teit can be scored objectively while the cloze test requires 
subjective scoring, md since the cloze test appiars to be the only other type of 
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objectively developed test appropriate for prose materials, it has been con- 
eluded that the reading-storage technique represents a better dependertt variable 
than any other existing objective tichniqui for measuring the primaiy effects 
of reading. 

In the Set 11 Experiments, the validity of the reading-storage test was 
investigated further by comparing it to a technique which has nlore intuitive 
appeal, the paraphrase ttchnique. In .one experiment, Gomprthension was manipu- 
lated by using special paragraphs which were highly difficult to comprehend 
without certain context cues. It was hypothesized that if the rtading-storage 
test was primarily sensitive to the mamorization of words instead of the primaiy 
effects of riading, then the score on the paragraphs given without context cues 
should be thi same as scores on the test given with contaxt cues . Also, if 
memorization is what the reading-storage test primarily measures, then the reading 
storage scores shoulOe much higher on the test liven without context cues than 
they are on a test that is given without the opportunity to read the paragraph 
first. The results indicated that there was slightly more gain, on the average, 
due to the assuinid comprihension conditions than there was due to the assumed 
"'^"'O^^ation conditions. More importantly, however, the reading-storage test was 
more sensitive to the assumed comprehension efficts than was the paraphrase test, 
and it was less sensitive to thfe assumed memorisation effects than was the para- 
phrasi test. 

TVo other experimants in Set II were replications of the two previously 
described experiments, where validity was explored by comparing reading. scores 
with non-reading scores and by manipulating the effects of reading, by deleting 
vaiying percents of the words. 

Hie paraphrase test was found to be somewhat more valid than the reading- 
storage test in one experiment but the reading-storage test was found to provide 
much more consistent or reliable results than the paraphrase test. In the other 
experiment, the, two tests appeartd to be equally effectivi. ' 

Considering the results from all three experiments, the reading-storage test 
appears to be just as valid a dependent variable l in the intuitively more appeal- 
ing, paraphrase test. Since the paraphrase test is developed subjectively, it is 
always questionable to generalize the results beyond the experimenter who developed 
the test. TTierefore. if the paraphrase technique is not shown to be consistently 
more valid than the reading-storage technique, it appears reasonable to conclude 
that the reading-storage technique Is more valid m a general method for measuring 
the primary effects of , reading. TTie completely objective, reading=storage test 
appears to provide a better technique than its two closest competitors, i.e., 



the cloze technique which is dgvelopid objectivily .but icortd subjectlvelyj and 
the paraphrase technique which is diveloped subjectively, but may be scored 
objictively. 

Tlii Phase II data, which was collected primarily to investigate the effect- = 
iveness of proirammed prosi, also provides support for the validity of the 
reading-storigi test because the paraphrase test results and the riading-storage 
test results were esientially equivalent, 

- . - - ■ ■ . . ■. ■ . . . . ■ 

Phase 11 , 

The primaiy purpose of Phase 11 was to invtstlgate the effeet of programmed 
prose upon learning* It was hypothesized that progranmied prose would act to 
incraase attention In low motivation conditions, where attention would be expected 
to ivane, and^ thereby facilitate leaming. It was also hypotheii zed that pro- 
grammed prose would act as a distractor in high motivation conditions j where 
attention would not be expecttd to waneV and thereby inhibit learning. TTiesr 
hypotheses were tested by administering regular prose and programmed prose under 
low and high motivation conditions. Uider the low motivation conditions 'the Ss 
were simply asked to do their best. Under the high motivation conditions, the 
Ss were told that they would be paid on the basis of how well they did. The 
results suppbrted the hypotheses, i.e., programmed prose facilitated learning 
under the low motivation conditions and inhibited leaiiiing under the high motivation 
conditloni* However J the scores from the low motivation, programmed prose group 
were unexpectedl)^ higher than the scores ft'om the hi^ motlva^ 
prose gro^. It was concluded that the efficacy of programmid prosi and its inter- 
action with motivation deserves to be Inyestlgated further using different 
techniques for manipulating motivationi and using different levels of prose 
difficulty and individual ability. 
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